16.1 Introduction to Text Mining

Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.

  • Welbers et al. (2017) provides a mild introduction to Text Analytics using R

  • Text mining has gained momentum and is used in analytics worldwide

    • Sentiment Analysis

    • Predicting Stock Market and other Financial Applications

    • Customer influence

    • News Analytics

    • Social Network Analysis

    • Customer Service and Help Desk

16.1.1 Text Data

  • Text data is ubiquitous in social media analytics.

  • Traditional media, social media, survey data, and numerous other sources.

    • Twitter, Facebook, Surveys, Reported Data (Incident Reports)
  • Massive quantity of text in the modern information age.

  • The mounting availability of and interest in text data has been the development of a variety of statistical approaches for analysing this data.

16.1.2 Generic Text Mining System

knitr::include_graphics("fig-2.png")
Generic Text Mining System

Figure 16.1: Generic Text Mining System

16.1.3 Data pre-processing in Text Mining

  • Following figure summarises main steps in a typical data pre-processing stage of text mining
knitr::include_graphics("fig-3.png")
Typical Text Pre-procesing

Figure 16.2: Typical Text Pre-procesing

References

Feldman, Ronen, and James Sanger. 2007. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.