Open transparent and reproducible science
University of Bergen
Can you spell
funktion FUNkshun
function?
You can
use a \ instead
Takes no arguments. Does not work. Returns nothing.
Guaranteed never to fail.
f
is a terrible name for a function - use a descriptive namefunction
used to define functions()
contain all the arguments separated by commas{}
contain the body of the functionlibrary(tidyverse)
library(palmerpenguins)
penguins |>
filter(species == "Gentoo") |>
ggplot(aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Body mass g", y = "Bill length mm", title = "Gentoo")
penguins |>
filter(species == "Adelie") |>
ggplot(aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Body mass g", y = "Bill length mm", title = "Gentoo")
General – Specific functions
Make your functions idiot proof.
One function - one job
Large complex function
# Better
import_fun <- function(arg1) {
# complex code to import data
}
clean_fun <- function(.data) {
# complex code to clean data
}
model_fun <- function(.data_clean) {
# complex code to model code
}
plot_fun <- function(.data_clean) {
# complex code to plot model
}
complex <- function(arg1, arg2, ...) {
.data <- import_fun(arg1)
.data_clean <- clean_fun(.data)
mod <- model_fun(.data_clean)
plot <- plot_fun(.data_clean)
list(mod = mod, plot = plot)
}
Tidyverse is fantastic for interactive analyses.
Bit complex with functions.
filter_penguins <- function(.data, species) {
.data |>
filter(species == species)
}
filter_penguins(penguins, "Gentoo")
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Do you need to use tidyverse?
filter_penguins <- function(.data, species) {
.data |>
filter(species == {{species}})
}
filter_penguins(penguins, "Gentoo")
# A tibble: 124 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Gentoo Biscoe 46.1 13.2 211 4500
2 Gentoo Biscoe 50 16.3 230 5700
3 Gentoo Biscoe 48.7 14.1 210 4450
4 Gentoo Biscoe 50 15.2 218 5700
5 Gentoo Biscoe 47.6 14.5 215 5400
6 Gentoo Biscoe 46.5 13.5 210 4550
7 Gentoo Biscoe 45.4 14.6 211 4800
8 Gentoo Biscoe 46.7 15.3 219 5200
9 Gentoo Biscoe 43.3 13.4 209 4400
10 Gentoo Biscoe 46.8 15.4 215 5150
# ℹ 114 more rows
# ℹ 2 more variables: sex <fct>, year <int>
filter_penguins <- function(.data, column, value, new_col) {
.data |>
filter({{column}} == {{value}}) |>
mutate({{new_col}} := mean(body_mass_g)) # := instead of =
}
filter_penguins(penguins, island, "Dream", island_mean)
# A tibble: 124 × 9
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Dream 39.5 16.7 178 3250
2 Adelie Dream 37.2 18.1 178 3900
3 Adelie Dream 39.5 17.8 188 3300
4 Adelie Dream 40.9 18.9 184 3900
5 Adelie Dream 36.4 17 195 3325
6 Adelie Dream 39.2 21.1 196 4150
7 Adelie Dream 38.8 20 190 3950
8 Adelie Dream 42.2 18.5 180 3550
9 Adelie Dream 37.6 19.3 181 3300
10 Adelie Dream 39.8 19.1 184 4650
# ℹ 114 more rows
# ℹ 3 more variables: sex <fct>, year <int>, island_mean <dbl>
traceback()
to see where the error occurredoptions(error = recover)
to enter the debugger when an error occursprint()
statements to see state of variablesbrowser()
somewhere strategicIn the debugger, use
n
to step through the codec
to continues
to step into a functionQ
to quitUse debug
to automatically add browser()
R is normally fast enough, but when it isn’t it isn’t
Use tictoc
package to time code
bench::mark()
Use bench::mark()
to compare different implementations
Readability is more important than speed
furrr
or future
packagesR for Data Science by Hadley Wickham and Garrett Grolemund
Advanced R by Hadley Wickham