Topic 13 Machine Learning using R-Introduction to Data Splitting, Sampling & Resampling

Some references: Boehmke and Greenwell (2019), Hastie et al. (2013) and Lantz (2019)

  • Data science is a superset of Machine learning, data mining, and related subjects. It extensively covers the complete process starting from data loading until production.

  • “Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data, such as from sensor data or databases.” (Wikipedia)

  • Primary goal of a ML implementation is to develop a general purpose algorithm that solves a practical and focused problem.

  • Important aspects in the process include data, time, and space requirements.

  • The goal of a learning algorithm is to produce a result that is a rule and is as accurate as possible.

References

Boehmke, Brad, and Brandon M Greenwell. 2019. Hands-on Machine Learning with r. CRC Press. https://bradleyboehmke.github.io/HOML/.
Hastie, Trevor, Robert Tibshirani, Gareth James, and Daniela Witten. 2013. An Introduction to Statistical Learning with Applications in r. Springer New York.
Lantz, Brett. 2019. Machine Learning with r (3rd Edition). Packt Publishing. https://app.knovel.com/hotlink/toc/id:kpMLRE000A/machine-learning-with/machine-learning-with.