Data In The Wild
  1. Module 2
  2. 2.2: Writing Functions
  • Introduction
    • Data In The Wild
    • Course Structure
    • Contact Us
  • Module 1
    • 1.1: Introduction to RStudio
    • 1.2: Introduction to Coding
    • 1.3: 2-Dimensional Data and the tidyverse
  • Module 2
    • 2.1: Introduction to Descriptive Statistics and Data Visualization
    • 2.2: Writing Functions
    • 2.3: Plotting with ggplot2
    • 2.4: A Visualization Primer
    • 2.5: Sick Fish
    • 2.6: Exploring geom Functions
    • 2.7: Wrap-Up
  • Module 3
    • 3.1: Leopard Seals
    • 3.2: T-Tests
    • 3.3: Comparing (Multiple) Means
    • 3.4: Combining Data (Joins and Binds)
    • 3.5: K-Nearest Neighbor
  • Module 4
    • 4.1: Roads and Regressions
    • 4.2: Multiple Regression
    • 4.3: Using Functions to Automate Tasks
  • Resources
    • Learn More!

On this page

  • Learning Outcomes
  • Aquaculture, Immunology, and Disease
    • Data Exploration
    • The Data We Have
    • Discussion of the problem
  • Custom Functions
    • General Syntax Example
    • Building Our Custom Function
    • Thinking about Iteration
  • Why Functions?
    • Main Ways to Iterate
    • Filtering Our Data

Other Formats

  • PDF

2.2: Writing Functions

Learning Outcomes

  • Students will be able to apply fundamental components of the biology of recirculating aquaculture systems to hypothesize how these components could lead to unhealthy fish.
  • Students will be able to write custom functions that perform simple tasks.
  • Students will be able to describe the utility of iteration.

Aquaculture, Immunology, and Disease

What is aquaculture anyway, and how might it contribute to disease? Here’s a link to the presentation.

Data Exploration

Given the correlation we found earlier, it does seem like the fish are our likely culprit.

Since we’ve narrowed down the issue, we should ask our aquaculture specialists to provide us with some data about the tanks.

Group Brainstorm

Based on what we know about aquaculture systems, what types of data should we ask for to get to the bottom of this issue? Spend about 3 minutes brainstorming with your group and be ready to report back.

The Data We Have

# Load the tidyverse
library(tidyverse)

# Read in the data
tank_data <- read_csv("data/fish_tank_data.csv")

# Look at the data
glimpse(tank_data)
Rows: 1,000
Columns: 7
$ tank_id        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
$ species        <chr> "tilapia", "tilapia", "tilapia", "tilapia", "tilapia", …
$ avg_daily_temp <dbl> 23.57285, 23.75280, 23.07280, 23.74601, 24.40522, 23.25…
$ num_fish       <dbl> 105, 102, 109, 98, 103, 97, 101, 99, 102, 95, 100, 102,…
$ day_length     <dbl> 10, 10, 10, 12, 10, 10, 11, 10, 9, 10, 10, 12, 11, 11, …
$ tank_volume    <dbl> 400.7182, 399.4300, 399.4878, 400.8874, 399.8796, 399.8…
$ size_day_30    <dbl> 2779.726, 2785.846, 2781.342, 2784.785, 2786.340, 2782.…

Discussion of the problem

Trout are a cold-water species whereas tilapia are a warm-water species. After doing some reading, we’ve come up with some critical values for each species.

Water temperatures below 59º F for trout and below 75º F for tilapia are critically low temperatures and can result in suppression of the immune system.

Great, we now know what the critical temperature cutoffs are for both species — but there’s a problem. The cut-off temperatures that we have are in Fahrenheit but the temperature values in our data frame are in Celsius.

Our goal with this data is to ascertain if any of the tanks are below the critical temperature. Before we can do this, though, we need to convert tank temperatures from Celsius to Fahrenheit. Luckily for us, we’ve already done this before! (Hint: remember the mutate() function?).

Now, we are going to run through a slightly different way to accomplish this task. It still involves mutate(), but we need to take some additional steps first before we use it.

Custom Functions

We’ve already used functions a lot in this class, and it turns out that we can write our own functions to perform specific tasks.

This has lot of varied uses and can be very powerful. The key takeaway here, though, is that we can write a function that converts Celsius to Fahrenheit and apply it to our entire data set in a few lines of code.

General Syntax Example

Making a function involves creating local variables that act as a stand-in for the values you want to use later on. In the case below, our local variables are x, y, and z. You can see that they are local because they only circulate within the function and are not being pulled in from outside. However, once we are done defining our function we can assign values to the input variables (in this case, x and y) that the function will do its magic with. You can see this in the example chunk below, where x = 2 and y = 4. All manipulations done with the input variables are contained between the curly brackets ( { } ), and of course, we signify the start a function with function().

Function syntax follows the structure in the code chunk below, which can be viewed as:

name_of_new_function <- function(input1, input2){

what do you want to do with your input variables? (add, subtract, etc.)

output <- input1 + input2

return(output)

}

And we can see it in action here:

# General syntax
new_sum <- function(x,y){
  z <- x + y
  return(z)
}

# Examples when using the new function
new_sum(x = 2, y = 4)
[1] 6
new_sum(2, 4)
[1] 6
new_sum(7, 19)
[1] 26

Building Our Custom Function

In your groups, build a custom function called c_to_f() that converts a single value of Celsius to Fahrenheit. Remember, the equation is Celsius = Fahrenheit * (9/5) + 32. Instead of two arguments like we have above (x, y), this function will only require one input value (c).

Be prepared to show your code and run some test values.

c_to_f <- function(c = NULL){
  f <- (c * (9/5)) + 32
  return(f)
}

c_to_f(c = 40)
[1] 104

Thinking about Iteration

So, we have a great function that we can use to convert a Celsius value to Fahrenheit, but how do we apply that to every value in the temperature column, not just one? We’ve built the function for one value, not for a whole vector of values…

This is where mutate() comes in! As we’ve seen in the past, mutate() applies whatever function or mathematical formula we’ve given to it to each value in the column. It essentially is going row by row: applying the function or formula to one row, returning the new value in a new column, then moving to the next row to do the same.

# Let's first demonstrate applying our new function to a vector
# Only show the first 10 rows as well
head(c_to_f(tank_data$avg_daily_temp))
[1] 74.43113 74.75503 73.53104 74.74281 75.92940 73.86383
# Mutate is basically doing the same thing
tank_data <- tank_data %>%
  mutate(avg_daily_temp_F = c_to_f(avg_daily_temp))

# View mutate() results
tank_data
# A tibble: 1,000 × 8
   tank_id species avg_daily_temp num_fish day_length tank_volume size_day_30
     <dbl> <chr>            <dbl>    <dbl>      <dbl>       <dbl>       <dbl>
 1       1 tilapia           23.6      105         10        401.       2780.
 2       2 tilapia           23.8      102         10        399.       2786.
 3       3 tilapia           23.1      109         10        399.       2781.
 4       4 tilapia           23.7       98         12        401.       2785.
 5       5 tilapia           24.4      103         10        400.       2786.
 6       6 tilapia           23.3       97         10        400.       2783.
 7       7 tilapia           23.5      101         11        400.       2784.
 8       8 tilapia           24.9       99         10        401.       2789.
 9       9 tilapia           23.7      102          9        400.       2788.
10      10 tilapia           23.8       95         10        399.       2785.
# ℹ 990 more rows
# ℹ 1 more variable: avg_daily_temp_F <dbl>

Why Functions?

We can easily fit the equation for converting from Celsius to Fahrenheit into a mutate call. We certainly didn’t need to write our own function to do it. So why did we learn how to do that?

Main Ways to Iterate

There are many different approaches to iterating in R, especially because we are typically working with vectorized data (columns in data frames). We will talk about the other options down the road. Today, we used #3, the mutate() function from the tidyverse.

  1. for loops
  2. apply/map functions
  3. using dplyr (tidyverse) functions

Filtering Our Data

Thankfully, we’ve solved our first problem! However, we haven’t yet answered the original question.

Are any tanks are below the temperature cutoffs for each species? If so, how many?

Let’s tackle that question:

# count() is a new function for most of us
# It does what it sounds like -- count the number of results

# Count number of tanks below temperature cutoff for TILAPIA
tank_data %>% 
  filter(species == "tilapia" & avg_daily_temp_F < 75) %>% 
  count()
# A tibble: 1 × 1
      n
  <int>
1   383
# for TROUT
tank_data %>% 
  filter(species == "trout" & avg_daily_temp_F < 59) %>% 
  count()
# A tibble: 1 × 1
      n
  <int>
1   131
# Let's introduce some additional syntax: "&" and "|"

# Now we can get counts for both fish species in one go!
tank_data %>%
  filter(species == "tilapia" & avg_daily_temp_F < 75 |
         species == "trout" & avg_daily_temp_F < 59) %>%
  group_by(species) %>%
  summarize(n = n()) # Another way to get the count is summarize() with n()
# A tibble: 2 × 2
  species     n
  <chr>   <int>
1 tilapia   383
2 trout     131
Copyright 2024, University of Arizona | Last modified: 12 June 2024
 
  • Made with Quarto