Open transparent and reproducible science
University of Bergen
Can you spellfunktion FUNkshun
function?
You can
use a \ instead
Takes no arguments. Does not work. Returns nothing.
Guaranteed never to fail.
f is a terrible name for a function - use a descriptive namefunction used to define functions() contain all the arguments separated by commas{} contain the body of the functionlibrary(tidyverse)
library(palmerpenguins)
penguins |>
filter(species == "Gentoo") |>
ggplot(aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Body mass g", y = "Bill length mm", title = "Gentoo")
penguins |>
filter(species == "Adelie") |>
ggplot(aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Body mass g", y = "Bill length mm", title = "Gentoo")General – Specific functions
Make your functions idiot proof.
One function - one job
Large complex function
# Better
import_fun <- function(arg1) {
# complex code to import data
}
clean_fun <- function(.data) {
# complex code to clean data
}
model_fun <- function(.data_clean) {
# complex code to model code
}
plot_fun <- function(.data_clean) {
# complex code to plot model
}
complex <- function(arg1, arg2, ...) {
.data <- import_fun(arg1)
.data_clean <- clean_fun(.data)
mod <- model_fun(.data_clean)
plot <- plot_fun(.data_clean)
list(mod = mod, plot = plot)
} Tidyverse is fantastic for interactive analyses.
Bit complex with functions.
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Do you need to use tidyverse?
# A tibble: 124 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Gentoo Biscoe 46.1 13.2 211 4500
2 Gentoo Biscoe 50 16.3 230 5700
3 Gentoo Biscoe 48.7 14.1 210 4450
4 Gentoo Biscoe 50 15.2 218 5700
5 Gentoo Biscoe 47.6 14.5 215 5400
6 Gentoo Biscoe 46.5 13.5 210 4550
7 Gentoo Biscoe 45.4 14.6 211 4800
8 Gentoo Biscoe 46.7 15.3 219 5200
9 Gentoo Biscoe 43.3 13.4 209 4400
10 Gentoo Biscoe 46.8 15.4 215 5150
# ℹ 114 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# A tibble: 124 × 9
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Dream 39.5 16.7 178 3250
2 Adelie Dream 37.2 18.1 178 3900
3 Adelie Dream 39.5 17.8 188 3300
4 Adelie Dream 40.9 18.9 184 3900
5 Adelie Dream 36.4 17 195 3325
6 Adelie Dream 39.2 21.1 196 4150
7 Adelie Dream 38.8 20 190 3950
8 Adelie Dream 42.2 18.5 180 3550
9 Adelie Dream 37.6 19.3 181 3300
10 Adelie Dream 39.8 19.1 184 4650
# ℹ 114 more rows
# ℹ 3 more variables: sex <fct>, year <int>, island_mean <dbl>
traceback() to see where the error occurredoptions(error = recover) to enter the debugger when an error occursprint() statements to see state of variablesbrowser() somewhere strategicIn the debugger, use
n to step through the codec to continues to step into a functionQ to quitUse debug to automatically add browser()
R is normally fast enough, but when it isn’t it isn’t
Use tictoc package to time code
bench::mark()Use bench::mark() to compare different implementations
Readability is more important than speed
furrr or future packagesR for Data Science by Hadley Wickham and Garrett Grolemund
Advanced R by Hadley Wickham