12 Bootstrap
12.1 Setup
::p_load(boot,
pacman
tidyverse, lme4)
12.2 Basics
Bootstrap is a term describing different resampling methods.
Resampling generally refers to the method creating a new set of data by randomly drawing data points out of an existing dataset. Bootstrapping is not the only application of a resampling method. Other popular applications are permutation tests, see this blogpost for a short introduction.
Bootstrapping is most often used to compute estimate uncertainty and bias. Uncertainty can be captured in a confidence interval. We can also compute a bias estimate, i.e. the estimated difference of our observed statistic and the expected value.
Lets say we have a sample of grades of students for the last statistics exam:
<- c(sample(seq(4,6,by = .5), size = 20, replace = TRUE),
grades sample(seq(1,3.5,by = .5), size = 5, replace = TRUE))
hist(grades)
We would like to estimate the mean of the population (all students). We can do this with bootstrap to include uncertainty about the distribution of the mean in the population.
Instead of assuming a specific distribution of grades
and constructing a CI around our estimate, we sample from the empirical distribution.
Our grades
vector corresponds perfectly to this empirical distribution.
Hence, we can sample directly from the grades vector.
N times.
plot(ecdf(grades), verticals = TRUE)
The following animations are taken from here. The data does not correspond to our example but it visualizes the process very well.

GIF of univariate bootstrap
We have to allow replacement as not allowing it would mean that each draw would depend on all draws before. This means that some entries will likely be drawn more than once. If there will be duplicated entries, there must be some entries of the original dataframe missing as well. How many unique data points (or rows / entries) would we expect for a bootstrap sample? Naturally, we can simulate this: Let’s say, for n = 100.
<- 1:100
nn mean(replicate(n = 1000, length(unique(sample(nn, replace = TRUE)))))
## [1] 63.408
Looks like our resampled dataframe will contain about 63 or 64 unique entries for n = 100. We can offer also an analytic solution, dependent on n.
<- function(n){
rep_fn <- 1/n # probability of drawing a specific row at each draw
p # 1-p is then the probability of not drawing a row
# (1-p)^n is the probability of not drawing a specific row for n draws.
# This corresponds to the percentage of rows which were not drawn in a bootstrap sample.
1-(1-p)^n # All other rows will be drawn.
}
rep_fn(100)
## [1] 0.6339677
<- seq(1, 15)
n_vec <- rep_fn(n_vec)
p_vec
plot(p_vec, type = "l", ylim = c(0,1), xlab = "n", ylab = "expected proportion of unique rows")
# analytic limit of rep_fn (as n -> inf):
abline(h = 1-1/exp(1), col = "red")
title(sub = "Expected proportion of unique rows drawn in bootstrap for specific sample size")
12.3 Non-parametric vs semi-parametric vs parametric bootstrap
When we have a linear model such as this:
\[Y=\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_2+\varepsilon_{i}\] Where \(\varepsilon \sim \mathcal{N}(\theta, \sigma^2)\)
There are different ways to compute a bootstrap.
So, according to our model specification our errors are normally distributed. Lets say we have a sample of size 100 generated from such a model with \(\beta_0 = 5\), \(\beta_1 = 3\) and \(\beta_2 = .4\). The error variance is specified as \(\varepsilon \sim \mathcal{N}(\theta = 0, \sigma^2 = 5)\)
set.seed(6134346)
# parameters
<- 100
n <- 5
b0 <- 3
b1 <- .4
b2 <- 5
var
# data simulation
<- sample(0:1, size = n, replace = TRUE)
X1 <- rnorm(n)
X2 <- b0 + b1 * X1 + b2 * X2 + rnorm(n, sd = sqrt(var)) Y
We then have a dataset randomly generated according to our model specifications.
<- data.frame(ID = 1:length(Y), X1, X2, Y)
original
ggplot(data = original, aes(y = Y, x = X2, col = as_factor(X1))) +
geom_point() +
theme_classic()
Of course, you usually don’t know the true parameter of the data generating process. Instead, you have to estimate them.
coef(lm(Y ~ X1 + X2, data = original))
## (Intercept) X1 X2
## 4.7953170 3.0049359 0.6664353
We get parameter estimates, and could compute corresponding confidence intervals based on the t-statistic. But we could also compute confidence intervals based on bootstrap samples.
12.3.1 Bootstrap resample functions by hand
In the non-parametric bootstrap we simply re-sample the data-points and then re-estimate the model parameters:

GIF of univariate bootstrap
We can create a function to do exactly this:
<- function(){
nonpara.boot <- nrow(original)
size <- original[sample(1:size, size, replace = TRUE), ]
temp.data
# re-estimate parameters:
coef(lm(Y ~ X1 + X2, data = temp.data))[3]
}
In a semi-parametric bootstrap, we use the model prediction on the original predictor variables and randomly reassign the residuals:
<- function(){
semipara.boot <- nrow(original)
size <- lm(Y ~ X1 + X2, data = original)
original.model <- resid(original.model)
residuals <- predict(original.model)
model.pred
<- original
temp.data $Y <- model.pred + residuals[sample(1:size, size, replace = TRUE)]
temp.data
# re-estimate parameters:
coef(lm(Y ~ X1 + X2, data = temp.data))[3]
}
The parametric bootstrap relies on our assumption that the residuals are normally distributed, and resamples from the estimated error distribution.
<- function(){
parametric.boot <- nrow(original)
size <- lm(Y ~ X1 + X2, data = original)
original.model <- rnorm(size, mean = 0, sd = summary(original.model)$sigma)
residuals <- predict(original.model)
model.pred
<- original
temp.data $Y <- model.pred + residuals[sample(1:size, size, replace = TRUE)]
temp.data
# re-estimate parameters:
coef(lm(Y ~ X1 + X2, data = temp.data))[3]
}
12.3.2 Resampling
We can now use the functions created above to draw a specified number of bootstrap samples of our parameter estimates.
<- 1000
nboot
<- replicate(nboot, nonpara.boot())
nonpara.samples
<- replicate(nboot, semipara.boot())
semipara.samples
<- replicate(nboot, parametric.boot())
para.samples
<- data.frame(id = 1:nboot,
samples.dat nonpara = nonpara.samples,
semipara = semipara.samples,
parametric = para.samples) %>%
pivot_longer(
cols = c("nonpara", "semipara", "parametric"),
names_to = "type",
values_to = "statistic"
)
Now we can plot the bootstrapped statistics:
::p_load(patchwork) # allows stitching plots together
pacman
<- ggplot(data = samples.dat,
histo aes(x = statistic, fill = type)) +
geom_histogram() +
facet_wrap(~type) +
theme_classic()
<- ggplot(data = samples.dat,
densi aes(x = statistic, color = type)) +
geom_density() +
theme_classic() +
theme(legend.position = "none")
/ densi + plot_layout(guides = "collect") histo
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
12.4 Boot package (and extension)
12.4.1 Bootstrap resampling
To bootstrap with the boot()
function we still need to create a function for statistic sampling, still dependent on the type of bootstrap we want to conduct.
Again, the function differs between parametric and non-parametric bootstrap, but both are possible using the boot()
function.
Importantly, we have to write a function that accepts two inputs: the original data, and a vector of indices which facilitates the data shuffling (i.e. the choice of rows to sample from the data).
The boot()
function is designed to accept multiple parameter estimates as output of the bootstrap function, so we can simultaneously bootstrap all model parameters.
<- function(data, indices){
boot.nonpara.fn <- data[indices, ]
temp.data # re-estimate parameters:
coef(lm(Y ~ X1 + X2, data = temp.data))
}
<- boot::boot(data = original,
booted.nonpara.est statistic = boot.nonpara.fn,
R = nboot)
booted.nonpara.est
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot::boot(data = original, statistic = boot.nonpara.fn, R = nboot)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 4.7953170 0.003583439 0.2880788
## t2* 3.0049359 0.009609618 0.4154549
## t3* 0.6664353 0.004120620 0.2527025
We can look at the vector of bootstrapped estimates at object$t
head(booted.nonpara.est$t)
## [,1] [,2] [,3]
## [1,] 4.722757 3.582802 0.6037329
## [2,] 4.842129 2.778300 0.6872599
## [3,] 4.926578 2.805035 0.9604704
## [4,] 5.044137 2.609505 0.6259701
## [5,] 4.664711 3.463633 1.4127219
## [6,] 4.798308 3.447769 0.4478153
For a parametric bootstrap we need to work a little more.
We need to set up a function that computes the parameters given a data set.
This function is very similar to the function above (booted.nonpara.fn()
).
But, it does not need to have a second argument, and does not need resampling of the rows:
<- function(data){
boot.para.fn # reestimate parameters
coef(lm(Y ~ X1 + X2, data = data))
}
We need not shuffle the rows in the statistic funtion because the data used in that function is created using another function we need to set up: A data creation function. This function needs as input the original data and the (MLE) parameter estimates
# the data creation function.
# the model consists of 4 estimates, three beta coefficients and the estimated error variance.
<- function(data, theta){
boot.data.fn $Y <-theta[1] + theta[2] * data$X1 + theta[3] * data$X2 + # original model prediction
datarnorm(n = nrow(data), mean = 0, sd = theta[4]) # residuals
# this could also be solved using the predict function
#data$Y <- predict(lm(Y ~ X1 + X2, data = original)) + rnorm(n = nrow(data), mean = 0, sd = theta[4])
return(data)
}
Where theta
is:
<- c(coef(fit <- lm(Y ~ X1 + X2, data = original)), sigma(fit)) theta
And the boot()
function is specified like this:
<- boot::boot(data = original,
booted.para.est statistic = boot.para.fn,
R = nboot,
sim = "parametric",
ran.gen = boot.data.fn,
mle = theta)
booted.para.est
##
## PARAMETRIC BOOTSTRAP
##
##
## Call:
## boot::boot(data = original, statistic = boot.para.fn, R = nboot,
## sim = "parametric", ran.gen = boot.data.fn, mle = theta)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 4.7953170 0.010092838 0.3066339
## t2* 3.0049359 -0.003027481 0.4485276
## t3* 0.6664353 -0.009693141 0.2309982
12.4.2 Bootstrap confidence intervals
We will have a look at two types of bootstrap confidence intervals (CI). But there are more ways to constuct CI’s using bootstraped estimates.
We can compute CI’s using the boot.ci()
function of the boot package.
As input, it takes the output of the boot()
function above.
The function allows us to specify the type of CI we wish to compute.
Here, we will introduce the two most common CI types.
Normal CI
This CI relies on the assumption that the bootstrap parameter samples are normally distributed.
We can check this by applying the generic plot()
function to the output of the boot()
function.
plot(booted.para.est, index = 3)
Seems appropriate.
We can then specify type = "norm"
in boot.ci()
to get a symmetric CI around
<- boot.ci(boot.out = booted.para.est,
booted.para.ci conf = .95,
type = "norm",
index = 3) # indicates which parameter we are interested in
booted.para.ci
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = booted.para.est, conf = 0.95, type = "norm",
## index = 3)
##
## Intervals :
## Level Normal
## 95% ( 0.2234, 1.1289 )
## Calculations and Intervals on Original Scale
Confirm by hand:
# lower bound
$t0[3] - # original estimate
booted.para.estmean(booted.para.est$t[, 3]) - booted.para.est$t0[3]) - # bias
(qnorm(p = .975) * # .975 quantile (ca. 1.96)
sd(booted.para.est$t[, 3]) # standard error
## X2
## 0.2233803
# upper bound
$t0[3] - # original estimate
booted.para.estmean(booted.para.est$t[, 3]) - booted.para.est$t0[3]) + # bias
(qnorm(p = .975) * # .975 quantile (ca. 1.96)
sd(booted.para.est$t[, 3]) # standard error
## X2
## 1.128877
Percentile CI
“perc”: Percentile, we look at the empirical distribution of the bootstrapped estimates and chose as lower limit of the ci the .025% percentile, and for the upper limit the .975 percentile.
<- boot.ci(boot.out = booted.nonpara.est,
booted.nonpara.ci conf = .95,
type = "perc",
index = 3) # indicates which parameter we are interested in
booted.nonpara.ci
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = booted.nonpara.est, conf = 0.95, type = "perc",
## index = 3)
##
## Intervals :
## Level Percentile
## 95% ( 0.1739, 1.1610 )
## Calculations and Intervals on Original Scale
Something similar but not equitailed we can do easily by just getting the quantiles of the bootstrapped estimates:
# lower bound
quantile(booted.nonpara.est$t[,3], probs = c(.025, .975))
## 2.5% 97.5%
## 0.1857002 1.1598780
12.4.3 Mixed effects models
Mixed effects models can be bootstrapped using the bootMer()
function supplied by the lme4
package.
Again, we need to supply a statistic function.
However, the function only takes one input, namely a fitted merMod object (a mixed effects model fitted by lmer()
or glmer()
).
To show how it works, we again look at the salary example:
<- read_csv(
df url("https://raw.githubusercontent.com/methodenlehre/data/master/salary-data2.csv")) %>%
mutate(company = as.factor(company),
sector = as.factor(sector),
id = as_factor(1:nrow(.)))
## Rows: 600 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): company, sector
## dbl (2): experience, salary
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
And fit the model:
<- lmer(salary ~ experience + (1 | company),
random.intercept.model data = df, REML = TRUE)
fixef(random.intercept.model)
## (Intercept) experience
## 5964.1757 534.3446
We can then create a function that extracts the statistic of interest from the fitted model:
<- function(model){
lmer.stat fixef(model)
}
<- bootMer(x = random.intercept.model,
lmer.boot.est nsim = nboot,
FUN = lmer.stat,
)
There are two more settings we can specify in the bootMer()
function:
use.u
: If TRUE, new random effects are drawn. In our model, this would mean that the function would simulate new companies by drawing new random effects.
Defaults to FALSE.
type
: Either parametric
or semiparametric
. Defaults to parametric
.
The parametric
bootstrap draws residuals from a standard normal distribution (for a lmer()
object), and does not affect the predictor values.
The semiparametric
bootstraps resamples the residuals, just as we did for the linear models.
Nonparametric bootstrap is not recommended for mixed effects models, as simply resampling rows would lead to groups of unequal sizes and the hierarchical structure would not be taken into account. However, there is still an open discussion going on about how exactly one could do nonparametric bootstrap.
Confidence intervals can be constructed using the boot.ci()
function from the boot
package:
::boot.ci(lmer.boot.est, type = "perc", index = 2) boot
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot::boot.ci(boot.out = lmer.boot.est, type = "perc", index = 2)
##
## Intervals :
## Level Percentile
## 95% (485.0, 588.9 )
## Calculations and Intervals on Original Scale
This function can not be used to bootstrap model comparisons. This is possible with the pbkrtest
package.
A much easier approach, albeit less versatile, to create bootstrap CIs for mixed effects model coefficients is the confint()
function:
confint(random.intercept.model, method = "boot", boot.type = "perc")
## Computing bootstrap confidence intervals ...
## 2.5 % 97.5 %
## .sig01 505.1255 1035.691
## .sigma 1028.5683 1145.750
## (Intercept) 5527.6358 6398.269
## experience 482.5724 581.129
confint(random.intercept.model, method = "boot", boot.type = "norm")
## Computing bootstrap confidence intervals ...
## 2.5 % 97.5 %
## .sig01 539.0105 1061.2622
## .sigma 1019.4132 1154.5558
## (Intercept) 5522.1569 6418.7375
## experience 480.3638 588.0091
12.5 Limitations
Extreme values and estimates from biased estimators should not be bootstrapped.
Let’s say we want to estimate the upper limit of a uniform distribution between 0 and 1 where we have a sample size of n = 50:
set.seed(1412525)
<- runif(n = 1000)
original.unif
hist(original.unif)
Assume that we know the data are uniformly distributed, but we do not know the upper limit.
We would estimate this limit as the maximum of the data.
max(original.unif) %>% round(3)
## [1] 0.998
If we would now bootstrap this statistic, we would get the following distribution of the statistic:
<- function(data, ind){
boot.unif.max max(data[ind])
}
<- boot(data = original.unif,
booted.unif statistic = boot.unif.max,
R = nboot)
hist(booted.unif$t)
abline(v = mean(booted.unif$t), col = "red")
In this case, the bootstrapped estimate is smaller than the original value whenever the maximum value of the original data is not sampled. So all estimates are either as far away of the true parameter, or even more biased (by definition).
Something similar happens if we (for example) try to bootstrap \(R^2\), here because \(R^2\) is a biased estimator. We can see the problem for the case of two independent variables.
I will show the extreme case of two variables with exactly 0 correlation:
<- MASS::mvrnorm(n = 100, mu = rep(0, 5),
original.rsq Sigma = diag(x = 1, nrow = 5),
empirical = FALSE)
#original.rsq <-data.frame(rnorm(100), rnorm(100))
cor(original.rsq)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000000000 0.0910027 -0.12706723 0.11518181 -0.002021796
## [2,] 0.091002698 1.0000000 -0.04332500 -0.16157754 -0.190770308
## [3,] -0.127067225 -0.0433250 1.00000000 0.01537066 0.096918133
## [4,] 0.115181806 -0.1615775 0.01537066 1.00000000 -0.035044094
## [5,] -0.002021796 -0.1907703 0.09691813 -0.03504409 1.000000000
<- data.frame("Y" = original.rsq[,1],
original.rsq "X1" = original.rsq[,2],
"X2" = original.rsq[,3],
"X3" = original.rsq[,4],
"X4" = original.rsq[,5])
<- summary(lm(Y ~ X1 + X2 + X3 + X4, data = original.rsq))$r.squared orig.rsq.est
Because the variables are empirically uncorrelated, the original \(R^2\) is essentially 0. But if we bootstrap this estimate, the estimate distribution will not be around 0, as \(R^2\) will never be negative, no matter which way the coefficient estimate tilts.
<- function(data, ind){
boot.rsq <- data[ind,]
data summary(lm(Y ~ X1 + X2 + X3 + X4, data = data))$r.squared
}
<- boot(data = original.rsq,
booted.rsq statistic = boot.rsq,
R = nboot)
hist(booted.rsq$t)
abline(v = mean(booted.rsq$t), col = "red")
And the percentile CI would not include 0:
boot.ci(booted.rsq, type = "perc")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = booted.rsq, type = "perc")
##
## Intervals :
## Level Percentile
## 95% ( 0.0156, 0.1822 )
## Calculations and Intervals on Original Scale