Data In The Wild
  1. Introduction
  2. Data In The Wild
  • Introduction
    • Data In The Wild
    • Course Structure
    • Contact Us
  • Module 1
    • 1.1: Introduction to RStudio
    • 1.2: Introduction to Coding
    • 1.3: 2-Dimensional Data and the tidyverse
  • Module 2
    • 2.1: Introduction to Descriptive Statistics and Data Visualization
    • 2.2: Writing Functions
    • 2.3: Plotting with ggplot2
    • 2.4: A Visualization Primer
    • 2.5: Sick Fish
    • 2.6: Exploring geom Functions
    • 2.7: Wrap-Up
  • Module 3
    • 3.1: Leopard Seals
    • 3.2: T-Tests
    • 3.3: Comparing (Multiple) Means
    • 3.4: Combining Data (Joins and Binds)
    • 3.5: K-Nearest Neighbor
  • Module 4
    • 4.1: Roads and Regressions
    • 4.2: Multiple Regression
    • 4.3: Using Functions to Automate Tasks
  • Resources
    • Learn More!

Data In The Wild

An Introductory Data Science Course for the Life Sciences

Welcome! This course introduces students in the life sciences to data science through the application of quantitative reasoning and the R programming language to “real-world” problems. Below, you will find the overall narrative and material that will be covered. There are four modules that guide the student from their first steps on the Land of the Penguins, to building new roads for access to fishing sites. Each module will lead them through basic programming, data visualization, statistics, machine learning, and interpretation of quantitative concepts.

The Narrative

⊳ Module 1: Mission Antarctica!

A new effort to establish a permanent, sustainable colony in Antarctica is being launched. Students are introduced to the field of data science, the application (RStudio, RMarkdown, Jupyter Notebooks), and programming language (R, Python) used in the course.

⊳ Module 2: Good Food Gone Bad

There is a food poisoning outbreak among team members. Students use data visualization to determine where the problem lies (fish, not plants) and simulations to determine the root of the problem (fish tank density, not fish tank temperature).

⊳ Module 3: Follow That Seal

The fish tanks need to be restocked, but we want to avoid fishing in places with high leopard seal density, so we track the seals. However, the radio collars on leopard seals are failing and there is a deadly conflict between fish collectors and seals. The collars come from two different manufacturers, but we need to tell how the collars are failing (days to recharge, not signal distance) and how to classify collars of unknown provenance.

⊳ Module 4: March of the Penguins

A new road is needed to access fishing sites with low leopard seal density. There are several possible routes, but we want to avoid crossing through Gentoo penguin nesting grounds. Students will build models to determine predictors of nesting success, first with bootstrapping for confidence intervals, then with linear regression.

How to Navigate this Site

Under Course Materials, you will find lessons taught in WFSC 223 at the University of Arizona as well as links download lectures, assignments, discussions, and other assets used for instruction.

Overall Learning Objectives

By the end of the semester, you will be able to…

  1. Define, differentiate, and explain the nature and application of computational methods

    for acquiring, managing, analyzing, visualizing, and sharing data as it relates to real-

    world natural resource scenarios.

  2. Associate, examine, and compare how to infer meaning and insight from data through

    written, visual, and verbal communication to multiple audiences.

  3. Summarize, implement, and appraise multiple perspectives and make meaningful

    connections across disciplines and social positions, think conceptually and critically, and

    solve problems with data informed approaches.

Footnotes (For Instructors)

This course was developed by Drs. Katy Prudic, Jeff Oliver, Keaton Wilson, and Ellen Bledsoe. This was taught in pilot form at the University of Arizona in Spring 2020 as Settlers of Antarctica, and it has undergone multiple revisions to better communicate and optimize the material to both students and instructors over the years. At the end of the day, with the help of funding from the NSF’s Harnessing the Data Revolution, the material is designed to be adaptable to other disciplines and serve as a template for your own courses. R was used as the primary programming language here, but the course is modifiable to other languages such as Python.

Copyright 2024, University of Arizona | Last modified: 12 June 2024
 
  • Made with Quarto