Data Science Lessons
The second part of the workshop series focused on training sessions on some tools that can be used (or useful for) data science in the classroom. The four lessons are (links will take you to the actual lessons, which are hosted elsewhere):
- Data visualization with ggplot: Data visualization is a powerful means of communicating concepts and patterns in data. Getting your visualizations to look good in R can take some work, but the ggplot2 package makes it easier. This lesson introduces how to use ggplot2 package to create publication-quality graphics.
- Easier data manipulation with the tidyverse: R is a powerful language for managing, analyzing, and visualizing complex data. However, some of the commands in R are esoteric or just plain confusing. The tidyverse package for R includes several helpful functions that make it easier to manipulate, summarize, and visualize your data. In this lesson we’ll use these functions to create plots from summary statistics.
- An introduction to RStudio and GitHub: The Git system is designed to make collaboration easier and more transparent. This lesson provides an introduction to the version control system Git, one central sharing point called GitHub, and how you can use the two in RStudio.
- An introduction to machine learning in R: Machine learning has been all the rage, but there remains a lot of mystery about what machine learning really is. In this workshop, we will work through an example of how we use machine learning to make predictions and how to assess how good those predictions really are. The material is designed for people who have little to no experience with machine learning.
- Making web pages with Quarto: This tutorial, written by Sam Csik, provides an introduction to the Quarto system for making webpages hosted on GitHub. Quarto websites are a (relatively) accessible means of creating and provide content for data science learning goals.