13.8 Bootstrapping

  • Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. (Wikipedia)

  • This means that, after a data point is selected for inclusion in the subset, it’s still available for further selection.

  • A bootstrap sample is the same size as the original data set from which it was constructed.

  • Since observations are replicated in bootstrapping, there tends to be less variability in the error measure compared with k-fold CV.

Bootsrapping in R

  • rsample package provides bootstraps function for bootstrapping a sample
boot1 = bootstraps(d_bhp[2], times = 10)
# Bootstrap sampling 
# A tibble: 10 x 2
   splits            id         
   <list>            <chr>      
 1 <split [654/241]> Bootstrap01
 2 <split [654/253]> Bootstrap02
 3 <split [654/240]> Bootstrap03
 4 <split [654/226]> Bootstrap04
 5 <split [654/223]> Bootstrap05
 6 <split [654/242]> Bootstrap06
 7 <split [654/241]> Bootstrap07
 8 <split [654/240]> Bootstrap08
 9 <split [654/258]> Bootstrap09
10 <split [654/249]> Bootstrap10