How Thematic Sentiment Analysis works

Thematic Sentiment Analysis uses our proprietary supervised machine learning model to add a sentiment score to comments and themes. The model has been trained specifically on customer feedback and has been optimized to predict with a high accuracy on this data.

1. What Thematic Sentiment Analysis does

Each comment is split into sentences, each sentence is tagged with themes and a sentiment score. 

Our model finds the sentiment score from just reading the text, without any other data. This can be useful, especially for datasets which do not have a rating or a score such as NPS, as we can use the sentiment to perform analysis of different themes.

For example:

When the above comment gets processed. It will first be split into 3 sentences, each tagged with themes and sentiment: 

  1. The first sentence is tagged with themes “great flight crew” and “love new planes”  and both themes are assigned positive sentiment. 
  2. The second sentence is tagged with “boarding process” and “gate attendants” and both themes are  assigned negative sentiment. 
  3. The third sentence has not been tagged with a theme in this case, so the sentiment is not visible.

Sometimes, the sentiment is clear from the theme name itself, like in the first sentence, and sometimes it’s only clear from the score associated with the theme, like in the second sentence.

2. How we construct training and test data for sentiment analysis

The model is trained and tested on millions of customer feedback comments. Each comment has an assigned validated sentiment score. The validation of this score happens through a combination of scores and ratings customers provide and rules on when these can be used as reliable training and test data. We follow Machine Learning best practices in terms of splitting our data into training, validation and test sets. We use our validation data for validating experiments with our model and only run on our test data once we have completed a project and want to obtain a score reflective of how the model will perform on real world data. This avoids ‘overfitting’ to the test data set.

3. How the Thematic Sentiment Analysis Model is trained

Sentences are split into words and then put into a Deep Learning model. This model is made up of a complex architecture of convolutional and recurrent layers which are capable of effectively learning grammar and language rules for the purposes of predicting sentiment.

4. How accurate is Thematic Sentiment Analysis?

Our test data has 15,000 comments labelled as having either positive or negative sentiment through the validation process described in 3. 

On this data, our current model has a 92% accuracy rate.

Our accuracy rate is very high when the ambiguity often found in language is considered, and that human accuracy will be below 100%

5. Where do errors come from?

Occasionally, the model makes errors, which is something we are still working on to improve. Here are the types of errors we’ve encountered so far:

  • Complex long term dependencies for example "Being pleased with this coffee is something that I would have expected"
  • Spelling mistakes and Bad grammar
  • Out of domain surveys - Data that is not verbatims from customer feedback or if the subject matter is something that the model hasn’t seen in our training data before.

6. Can we customize the sentiment analysis model for a customer?

Our default model doesn't know the overall context of the dataset or specific company knowledge. For instance, if a comment expresses negativity towards one of the companies competitors, the sentiment will be negative. 

We address this issue by creating specific models for our customers that can capture such nuances.

We can also customize the interpretation of the scores. For example, if the question is soliciting negative responses, we can scale the scores so that they are all more negative. 

7. How we categorize sentiment scores

We use a range of -1 to 1 to score sentiment. The table below shows how scores correspond to sentiment highlighting in the comments


Score Range Sentiment
-1 to -0.6 Strongly Negative
-0.6 to -0.05 Negative
-0.05 to 0.05 Neutral
0.05 to 0.6 Positive
0.6 to 1.0 Strongly Positive

To get the sentiment score for a comment, we take the average of the sentiment scores across  all of its sentences. The comment sentiment score is used in the ‘Themes’ visualization. And also can be used to create an ‘Impact on Sentiment’ visualization.