Topic 13 Machine Learning using R-Introduction to Data Splitting, Sampling & Resampling

Some references: Boehmke and Greenwell (2019), Hastie et al. (2013) and Lantz (2019)

  • Data science is a superset of Machine learning, data mining, and related subjects. It extensively covers the complete process starting from data loading until production.

  • “Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data, such as from sensor data or databases.” (Wikipedia)

  • Primary goal of a ML implementation is to develop a general purpose algorithm that solves a practical and focused problem.

  • Important aspects in the process include data, time, and space requirements.

  • The goal of a learning algorithm is to produce a result that is a rule and is as accurate as possible.


Boehmke, Brad, and Brandon M Greenwell. 2019. Hands-on Machine Learning with r. CRC Press.
Hastie, Trevor, Robert Tibshirani, Gareth James, and Daniela Witten. 2013. An Introduction to Statistical Learning with Applications in r. Springer New York.
Lantz, Brett. 2019. Machine Learning with r (3rd Edition). Packt Publishing.