13.7 CV for time series data
k-fold cv is random and doesnt preserve the order of the dataset
The order is important in time series applications which are common in financial data science.
- The corresponding training set consists only of observations that occurred prior to the observation that forms the test set.
- No future observations can be used in constructing the forecast.
- Since it is not possible to obtain a reliable forecast based on a small training set, the earliest observations are not considered as test sets.
CV for time series in R
- There are several ways to create time series samples in R. The caret package provides a function to accomplish this as well.
- The following creates time slices with a moving window of 500 days (initial window size) with a test period of 100 days (horizon)
- The function returns a list with two elements, train and test with training sample and testing sample
= xts(d_bhp$BHP.AX.Close, order.by = d_bhp$Date) d_bhp2 = createTimeSlices(d_bhp2, initialWindow = 500, horizon = 100, fixedWindow = TRUE)cv_ts
Hyndman, R. J., and G. Athanasopoulos. 2019. Forecasting: Principles and Practice. OTexts. https://otexts.com/fpp3/.