15.3 Creating Training and Testing Set and Control

  • Use rsample package and stratified sampling
  • Using mutliple cross validation for resampling
library(rsample)
library(caret)
set.seed(999)  #for reproducibility (can pick your own seed, but keep it consistent)
idx = initial_split(data = data_cr, prop = 0.8, strata = "Creditability")
d_train1 = training(idx)
d_test1 = testing(idx)
prop.table(table(d_train1$Creditability))

  0   1 
0.3 0.7 
prop.table(table(d_test1$Creditability))

  0   1 
0.3 0.7 
cntrl1 = trainControl(method = "repeatedcv", number = 10, repeats = 2)  #using repeated cross validate (repeating twice)