13.7 CV for time series data

  • k-fold cv is random and doesnt preserve the order of the dataset

  • The order is important in time series applications which are common in financial data science.

  • One method is to use Time series cross validation. Hyndman and Athanasopoulos (2019) https://otexts.com/fpp3/ section 5.9 provides detailed introduction to the technique.

  • Basic idea

    • The corresponding training set consists only of observations that occurred prior to the observation that forms the test set.
    • No future observations can be used in constructing the forecast.
    • Since it is not possible to obtain a reliable forecast based on a small training set, the earliest observations are not considered as test sets.

CV for time series in R

  • There are several ways to create time series samples in R. The caret package provides a function to accomplish this as well.
  • The following creates time slices with a moving window of 500 days (initial window size) with a test period of 100 days (horizon)
  • The function returns a list with two elements, train and test with training sample and testing sample
d_bhp2 = xts(d_bhp$BHP.AX.Close, order.by = d_bhp$Date)
cv_ts = createTimeSlices(d_bhp2, initialWindow = 500, horizon = 100, fixedWindow = TRUE)

References

Hyndman, R. J., and G. Athanasopoulos. 2019. Forecasting: Principles and Practice. OTexts. https://otexts.com/fpp3/.