Supporting open science through reproducible computing.
Bret Davidson | Eka Grguric
- Open science as problem space
- Open science as modern research practice
- Open science at NC State
- Scholar's Backpack
Open Science: what is it?
- Open Access
- Open Data
- Open Notebooks
- Open Source
Open Science is a return to first principles of scientific practice.
Nullius in Verba
"Take nobody's word for it."
Open Science can increase reproducibility.
Five Schools of Thought
by Sönke Bartling & Sascha Friesike
Aligns with core library values
- information access
- open peer review
- community-based knowledge creation
- the preservation and dissemination of research
- libraries are champions of open (open source; open data)
supporting their users
supporting research practice
Ongoing disruption by digital technologies in modern research practice
in support of open
Ecosystem of Support for Modern Research Practice at NCSU Libraries
The NCSU Libraries'
Open Science Initiative
- explore open science practice at NCSU
- better understand researcher needs in context
Take a non-prescriptive user-centered approach.
Create opportunities for communication.
Open Science Unconference
Follow-up Informal Interviews
- Modern Research Skills Gap
- Insufficient Incentives
- Hands on skill building
- Provide networking opportunities
- Increase visibility of library spaces & services
- Scholarly identity creation
- Scientific computing
- Building a website
- Data harvesting
- Code collaboration
The Planning Team
Representation from broad range of departments.
Summer of Open Science
- Intro to the Command Line Interface
- Web Scraping with Python
- Understand and Build Your Scholarly Identity
- Scientific Computing with Python & Raspberry Pi
- Build Your Scholarly Website the Easy Way
- End-of-Summer Showcase
Scientific Computing with Python & Raspberry Pi
40 person waiting list
Interdisciplinary Need:over 40 departments across ~16 colleges
- Libraries are well positioned to fill gaps in the curriculum
- "Open Science" attracted a range of disciplines
- High demand for introductory skill training, particularly coding
- Interest in interdisciplinary research sharing
- Summer presents interesting opportunities and challenges
Virtual Environments for Reproducible Computing
Technical workshops are ripe for disaster.
What could go wrong?
- Images reset overnight
- Improper permissions
- Network connectivity issues
- Language Versions
- Missing packages
- Inconsistent user environments
- Inconsistent course materials
- Provisioning is time consuming
- Difficult to collaborate
- Data types and structures
- Module system
- Control Structures
- Exception Handling
- Working with file system
- Retrieve a web page with Requests
- Parse content with Beautiful Soup
- Generate a word cloud with matplotlib
- Consistency across lab environments
- Ability to see results of code
- Consistency across time
- Ease of collaboration
- Custom Operating System Images
- Custom Distributions, e.g. Anaconda
- Interactive Environments, e.g. Jupyter
- Vagrant for managing operating system
- Ansible for provisioning and configuration
- Anaconda for managing environments and packages
- Workshop specific resources
- Install Vagrant
- Install VirtualBox
- Clone project repo
- `vagrant up`
- `vagrant ssh`
- Execute code!
This is reproducible computing!
- Consistent environment user to user
- Single target for course materials
- Faster provisioning for new workshops
- Repeatable course to course
- R and R Studio
- Jupyter Notebook Server
- Example Notebooks
- Accessible from web browser
Create and configure lightweight, reproducible, and portable development environments.
- Easy installation through binary package
- Flexible configuration via text-based configuration file
- Single command: `vagrant up`
"Automation engine" for provisioning and configuration management.
"Establish and maintain consistency of an environment."
- Python & R
- Software packages
- Jupyter Notebooks
- Start Jupyter notebook server
- Set environment variables
- Set default login directory
astropy, beautifulsoup4, conda, flask, jupyter, matplotlib, numpy, nltk, pandas,
pillow, pip, pytest, qt, requests, scipy, scikit-learn, seaborn, sqlite, etc.
r, essentials, formatr, ggplot2, irkernel, knitr, kernsmooth, maps, markdown, mass, matrix, nnet, rbokeh, recommended, spatial, tidyr, etc.
- Docker containers for portability
- Embedded use in curriculum
- Additional open source contributions
Open Science represents a new framework for research and provides an opportunity for libraries to engage researchers in new ways.
NCSU Libraries has done workshops and outreach around this framework and there is evidence of strong interest across disciplines.
We are redeploying existing technical resources and cutting edge technology in ways that used to be difficult or impossible.
This approach has helped us identify a new leadership role for libraries in open research support.