Data In The Wild
An Introductory Data Science Course for the Life Sciences
Welcome! This course introduces students in the life sciences to data science through the application of quantitative reasoning and the R programming language to “real-world” problems. Below, you will find the overall narrative and material that will be covered. There are four modules that guide the student from their first steps on the Land of the Penguins, to building new roads for access to fishing sites. Each module will lead them through basic programming, data visualization, statistics, machine learning, and interpretation of quantitative concepts.
The Narrative
⊳ Module 1: Mission Antarctica!
A new effort to establish a permanent, sustainable colony in Antarctica is being launched. Students are introduced to the field of data science, the application (RStudio, RMarkdown, Jupyter Notebooks), and programming language (R, Python) used in the course.
⊳ Module 2: Good Food Gone Bad
There is a food poisoning outbreak among team members. Students use data visualization to determine where the problem lies (fish, not plants) and simulations to determine the root of the problem (fish tank density, not fish tank temperature).
⊳ Module 3: Follow That Seal
The fish tanks need to be restocked, but we want to avoid fishing in places with high leopard seal density, so we track the seals. However, the radio collars on leopard seals are failing and there is a deadly conflict between fish collectors and seals. The collars come from two different manufacturers, but we need to tell how the collars are failing (days to recharge, not signal distance) and how to classify collars of unknown provenance.
⊳ Module 4: March of the Penguins
A new road is needed to access fishing sites with low leopard seal density. There are several possible routes, but we want to avoid crossing through Gentoo penguin nesting grounds. Students will build models to determine predictors of nesting success, first with bootstrapping for confidence intervals, then with linear regression.
Overall Learning Objectives
By the end of the semester, you will be able to…
Define, differentiate, and explain the nature and application of computational methods
for acquiring, managing, analyzing, visualizing, and sharing data as it relates to real-
world natural resource scenarios.
Associate, examine, and compare how to infer meaning and insight from data through
written, visual, and verbal communication to multiple audiences.
Summarize, implement, and appraise multiple perspectives and make meaningful
connections across disciplines and social positions, think conceptually and critically, and
solve problems with data informed approaches.
Footnotes (For Instructors)
This course was developed by Drs. Katy Prudic, Jeff Oliver, Keaton Wilson, and Ellen Bledsoe. This was taught in pilot form at the University of Arizona in Spring 2020 as Settlers of Antarctica, and it has undergone multiple revisions to better communicate and optimize the material to both students and instructors over the years. At the end of the day, with the help of funding from the NSF’s Harnessing the Data Revolution, the material is designed to be adaptable to other disciplines and serve as a template for your own courses. R was used as the primary programming language here, but the course is modifiable to other languages such as Python.