16.7 Topic Modelling

This section will use bi-term topic modelling method to demonstrate topic modelling exercise.

Biterm Topic Modelling (BTM) (Yan et al. (2013)) is useful for short text like the twitter data we have in this example.

# load packages and rearrange data
library(udpipe)
library(data.table)
library(stopwords)
library(BTM)
library(textplot)
library(ggraph)
# rearrange to get doc id
data_tm = data_sent[, c(3, 2)]
colnames(data_tm)[1] = "doc_id"
# use parts of sentence (Nouns, Adjectives, Verbs for TM) Method is
# computationally intensive and can take several minutes.
anno <- udpipe(data_tm, "english", trace = 1000)
biterms <- as.data.table(anno)
biterms <- biterms[, cooccurrence(x = lemma, relevant = upos %in% c("NOUN", "ADJ",
    "VERB") & nchar(lemma) > 2 & !lemma %in% stopwords("en"), skipgram = 3), by = list(doc_id)]



set.seed(999)
traindata <- subset(anno, upos %in% c("NOUN", "ADJ", "VERB") & !lemma %in% stopwords("en") &
    nchar(lemma) > 2)
traindata <- traindata[, c("doc_id", "lemma")]
# fit 10 topics (other parameters are mostly default)
model <- BTM(traindata, biterms = biterms, k = 10, iter = 2000, background = FALSE,
    trace = 2000)


# extract biterms for plotting

biterms1 = terms(model, type = "biterms")$biterms

# The model, biterms, biterms1 were saved to create the plot in this markdown
# document.
  • Plot the topics with 20 terms and labelled by the proportion
plot(model, subtitle = "#auspol 14-20 Sep 2020", biterms = biterms1, labels = paste(round(model$theta *
    100, 2), "%", sep = ""), top_n = 20)
BTM Visualisation of #auspol

Figure 16.8: BTM Visualisation of #auspol

  • Other analysis which can be conducted may include, clustering analysis, co-word clusters, network analysis etc. Other Topic Modelling methods can also be implemented.

References

Yan, Xiaohui, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. “A Biterm Topic Model for Short Texts.” In Proceedings of the 22nd International Conference on World Wide Web, 1445–56.