16.1 Introduction to Text Mining
Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
Welbers et al. (2017) provides a mild introduction to Text Analytics using R
Text mining has gained momentum and is used in analytics worldwide
Predicting Stock Market and other Financial Applications
Social Network Analysis
Customer Service and Help Desk
16.1.1 Text Data
Text data is ubiquitous in social media analytics.
Traditional media, social media, survey data, and numerous other sources.
- Twitter, Facebook, Surveys, Reported Data (Incident Reports)
Massive quantity of text in the modern information age.
The mounting availability of and interest in text data has been the development of a variety of statistical approaches for analysing this data.
16.1.2 Generic Text Mining System
16.1.3 Data pre-processing in Text Mining
- Following figure summarises main steps in a typical data pre-processing stage of text mining