13.6 K-fold Cross Validation
k-fold cross-validation (aka k-fold CV) is a resampling method that randomly divides the training data into k groups (aka folds) of approximately equal size.
The model is fit on k−1 folds and then the remaining fold is used to compute model performance.
This procedure is repeated k times; each time, a different fold is treated as the validation set.
This process results in k estimates of the generalization error.
The k-fold CV estimate is computed by averaging the k test errors, providing us with an approximation of the error we might expect on unseen data.
K-fold CV in R
rsample
andcaret
package provide functionality to create k-fold CV
set.seed(999)
# using rsample package
= vfold_cv(d_bhp[2], v = 10) #v is the number of folds
cv1 #10 folds cv1
# 10-fold cross-validation
# A tibble: 10 x 2
splits id
<list> <chr>
1 <split [588/66]> Fold01
2 <split [588/66]> Fold02
3 <split [588/66]> Fold03
4 <split [588/66]> Fold04
5 <split [589/65]> Fold05
6 <split [589/65]> Fold06
7 <split [589/65]> Fold07
8 <split [589/65]> Fold08
9 <split [589/65]> Fold09
10 <split [589/65]> Fold10
# using caret package
= createFolds(d_bhp$BHP.AX.Close, k = 10)
cv2 $Fold01 #gives indices for 10 folds cv2
[1] 28 33 38 42 44 52 71 85 97 119 121 122 125 126 135 160 161 168 191
[20] 194 197 201 222 227 231 239 241 246 265 284 292 298 302 310 319 331 336 344
[39] 353 362 368 384 386 387 402 403 406 430 466 471 484 500 532 533 539 554 567
[58] 570 581 585 610 612 633 641 642