4 Functions

One of the most useful aspects of using a programming language is the ability to automate your workflow and to let the computer do our work for you. Learning to write your own functions is one of the most important steps in this direction. Fortunately, R makes it very easy to create your own functions.

A further motivation for learning how to write functions is that it is often necessary to do so when you need something that is missing in base R, and you do not want to rely one some third-party package. For instance, there is no function in R to compute the standard error of the mean. Although it is pretty easy to do so manually, there are good reasons to automate calculations that have to be performed repeatedly.

The basic idea of a function is that it takes one or more inputs, does something with these, and then returns an output. The inputs are called the function arguments, the code that does something with the arguments is called the function body, and the output is called the return value.

Suppose you want to write a function that squares a number and then adds 5. Let’s call the input argument x. First, we have to choose a name for our new function. We’ll call it square_add_five. Then, the function definition looks like this:

square_add_five <- function(x) {
    x^2 + 5
}

Once we have evaluated this code in the R console, we can call our function:

square_add_five(4)
## [1] 21

or

square_add_five(x = 4)
## [1] 21

The code x^2 + 5 is the function body. This takes the argument x, computes x^2 and then adds 5 to the result. The result of this computation is returned, and can be assigned to a new variable:

result <- square_add_five(4)

That’s the basic idea. Now we will take a closer look at how to translate a formula into R code, and then define a new function.

4.1 Standard error

When summarizing data sets, we often need to compute the standard error \(\hat\sigma_{\bar{X}} = \frac{\hat\sigma_X}{\sqrt{n}}\), where \(\hat\sigma_X\) is the sample standard deviation and \(n\) is the sample size.

Translating this formula into R code is fairly straightforward. If we have an imput vector x, we just need the standard deviation and the sample size. The standard deviation can be computed using the function sd(), and the sample size the number of observations in x, which is just the length of x.

We can compute \(\frac{\hat\sigma_X}{\sqrt{n}}\):

sd(x)/sqrt(length(x))

Let’s try this out with an example: we have a sample x consisting of ten observations.

set.seed(5434)  # to make the example reproducible
x <- rnorm(10, mean = 30, sd = 3)

mean(x)
## [1] 30.75678
sd(x)
## [1] 3.375447
se <- sd(x) / sqrt(length(x))
se
## [1] 1.06741

Now suppose that we want to compute the standard error for several samples, or for several groups within a data set. We would have to duplicate our code sd(x)/sqrt(length(x)) and adapt it to each new input vector. Even though this code looks manageable, code duplication is bad practice, because it can lead to errors. Another common source of errors are copy/paste errors: you might copy sd(x)/sqrt(length(x)) with the intention of changing the x to y , but you would have to remember to change each instance of x into a y. For more complex code, this is certainly not a good idea.

A much better approach is to create a function that can apply the code to any input vector.

4.2 Defining a function

Recall that a function has a name and arguments, which may have default values:

f(arg1, arg2 = val2)

The function f has two arguments, arg1 and arg2. Only arg2 has a default value. This means that arg1 is required.

Let’s give our standard error function the descriptive name standard_error. It will take one argument, x.

We use the function function() to define a new function:

standard_error <- function(x) {
    # Code
}

Now we need the function body. This is the code between the curly braces {}. The last statement in the function body that is evaluated is automatically returned.

4.3 Standard error function

Now we are ready to define the standard error function:

standard_error <- function(x) {

    n <- length(x)
    sd(x)/sqrt(n)
}

We can use this function with any valid input vector:

standard_error(x)
## [1] 1.06741