## 13.6 K-fold Cross Validation

• k-fold cross-validation (aka k-fold CV) is a resampling method that randomly divides the training data into k groups (aka folds) of approximately equal size.

• The model is fit on k−1 folds and then the remaining fold is used to compute model performance.

• This procedure is repeated k times; each time, a different fold is treated as the validation set.

• This process results in k estimates of the generalization error.

• The k-fold CV estimate is computed by averaging the k test errors, providing us with an approximation of the error we might expect on unseen data.

### K-fold CV in R

• rsample and caret package provide functionality to create k-fold CV
set.seed(999)
# using rsample package
cv1 = vfold_cv(d_bhp[2], v = 10)  #v is the number of folds
cv1  #10 folds
#  10-fold cross-validation
# A tibble: 10 x 2
splits           id
<list>           <chr>
1 <split [588/66]> Fold01
2 <split [588/66]> Fold02
3 <split [588/66]> Fold03
4 <split [588/66]> Fold04
5 <split [589/65]> Fold05
6 <split [589/65]> Fold06
7 <split [589/65]> Fold07
8 <split [589/65]> Fold08
9 <split [589/65]> Fold09
10 <split [589/65]> Fold10
# using caret package
cv2 = createFolds(d_bhp$BHP.AX.Close, k = 10) cv2$Fold01  #gives indices for 10 folds
 [1]  28  33  38  42  44  52  71  85  97 119 121 122 125 126 135 160 161 168 191
[20] 194 197 201 222 227 231 239 241 246 265 284 292 298 302 310 319 331 336 344
[39] 353 362 368 384 386 387 402 403 406 430 466 471 484 500 532 533 539 554 567
[58] 570 581 585 610 612 633 641 642