The modern research skill set:

Using Vagrant, Ansible, and Python to support researchers

Bret Davidson | Eka Grguric

NCSU Libraries

bretdavidson.github.io/pydata-carolinas-2016

Agenda

  • Open science as problem space
  • Open science as modern research practice
  • Open science at NC State
  • Vagrant, Ansible, & Python
Royal Society

Nullius in Verba

"Take nobody's word for it."

Only 6 out of 53 “landmark" cancer studies could be reproduced.

Nature, www.nature.com/nature/journal/v483/n7391/full/483531a.html

How Science Goes Wrong

"Too much trusting, not enough verifying."

Economist, www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-it-needs-change-itself-how-science-goes-wrong

Reproducibility Crisis

Nature Nature, www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970

What is Open Science?

  • Open Access
  • Open Data
  • Open Notebooks
  • Open Source

Open Science is a return to first principles of scientific practice.

Open Science can improve reproducibility.

Open practices require new skillsets.

What is Open Science?

“Open Science

is an umbrella term encompassing

a multitude of assumptions

about the future of knowledge creation and dissemination."

The way that research is

carried out

The way that research is

disseminated

How

digital technologies

are affecting

the practice of science

Open Source Logo
Open Science Logo
Paul David

Paul David


Economist / Historian

The Knowledge Economy

Science

(as we know it)

has always been open

Increasing fragility

of this traditional openness

A response to a

crisis of the scientific method

Science =/= Truth

Ask

questions of nature

through experimentation

Uncover

possible answers

to these questions

Share

these answers

so that others may question them

Five schools of thought


by Sönke Bartling & Sascha Friesike

Editors, http://book.openingscience.org/

Infrastructure

tech. architecture

Public

accessibility of knowledge creation + citizen science

Measurement

alt. impact measurement

Democratic

access to knowledge

Pragmatic

collaborative research

Cancer Reproducibility
Austerity Measures
Flint Water Crisis
Zika Reproducibility

Why Libraries?

Hunt Library

Aligns with core library values such as:

  • information access
  • peer review
  • community-based knowledge creation
  • the preservation and dissemination of research

Libraries are champions of open source

NCSU Libraries Github

Libraries

are about

supporting their users

Academic Libraries

are about

supporting research practice

Ongoing disruption of digital technologies in modern research practice

Hypothetical Open Science Workflow

Open Science Workflow

101 Innovations in Scholarly Communication, https://innoscholcomm.silk.co/

Policy Shifts

in support of open

OECD Policy Paper

Open Science at NC State

The NCSU Libraries'
Open Science Initiative

Goals

  • explore open science practice at NCSU
  • better understand researcher needs in context

We took a non-prescriptive
user-centered approach

Creating opportunities for

communication

Open Science Unconference

Open Science Unconference

Follow-up Informal Interviews

Modern Research Skills Gap

Insufficient Incentives

Summer of Open Science
Summer of Open Science Event Series

Sharing Collective Knowledge

Ekatarina [Eka] Grguric (Project Lead)

Lauren Di Monte (Project Manager)

Alison Blaine (Content Development)

Bret Davidson (Technical Lead)

Jennifer Garrett (Community Development)

Workshops and Events:

  • Introduction to the Command Line Interface
  • Web Scraping with Python
  • Understand and Build Your Scholarly Identity
  • Scientific Computing with Python & Raspberry Pi
  • Build Your Scholarly Website the Easy Way

  • Meetups
  • End-of-Summer Showcase!
SoS Python Workshop in Makerspace

Python and Scientific Computing Workshop:

40 person waitlist

SoS Python Workshop in Makerspace 2

Interdisciplinary Need:
over 40 departments across ~16 colleges

Tech instruction as gap filler

Web Scraping with Python

Learning Outcomes

  • Basic Python data types and structures
  • Python module system
  • Retrieve a web page with Requests
  • Parse content with Beautiful Soup
  • Generate a word cloud with matplotlib
  • Control Structures
  • Exception Handling
  • Working with file system

Python Course Challenges

  • Consistency across user environments
  • Consistency of course materials
  • Time to provision computing environments
  • Ease of collaborating on the creation of course materials

Many Options

  • Custom OS Images
  • Custom Distributions, e.g. Anaconda
  • Interactice Environments, e.g. Jupyter

Our Approach

  • Vagrant: for managing OS
  • Ansible: for provisioning and configuration
  • Course or lab specific packages and resources

Vagrant

Vagrant

Create and configure lightweight, reproducible, and portable development environments.

Usage

  • Easy installation through binary package.
  • Flexible configuration via text file.
  • Single command: `vagrant up`
vagrant file
vagrant file
vagrant file

Ansible

"Automation engine" for provisioning, configuration management, app deployment, and other IT needs.

Provisioning

"To make something available."

Installation!

Configuration Management

"Establish and maintain consistency of an environment."

playbook.yml

playbook
role
role
role

python-vagrant

github.com/NCSU-Libraries/python-vagrant

github.com/NCSU-Libraries/python-vagrant

python-vagrant

Easy!

  1. Install Vagrant
  2. Clone python-vagrant
  3. `vagrant up`
  4. `vagrant ssh`
  5. Execute code!

Future Work

Richer Environments

  • Better suited to scientific computing
  • Improved adherance to best practices
  • Optimize relationship between images, provisioner, and tutorials

Scholarly Backpacks

  • General Research Tools
  • Data Science
  • Visualization

Embedded environments

  • Curricular use
  • Laboratory use

Summary

Emphasis on reproducibility has ignited a shift toward new practices.

With these new practices come new requirements for researchers.

Reproducible and portable computing environments are critical for future success.

Tools like Vagrant and Ansible can help researchers develop the scientific environments they need to be productive.

Thanks!

eka_grguric@ncsu.edu

bret_davidson@ncsu.edu

github.com/NCSU-Libraries/python-vagrant