Thematic API - Retrieving Data

Note: This article assumes that you have an ACCESS_TOKEN ready for use. The examples in the code below will include where the word ACCESS_TOKEN needs to be replaced with your token.

Note 2: Please see the note on regions at the bottom.

These instructions include examples using curl in bash notation,  if you are using a different terminal there may be some modification necessary to make the commands work. The commands also assume you are using the US data center. If this isn't the case the examples will need to be modified with the correct base url.

Organization and Dataset (survey) identifiers

The page on uploading data contains details on how to retrieve the organization and dataset (survey) identifiers needed to make the calls used in this article. Please refer to that page to obtain the relevant identifiers.

Methods of retrieving data

There are 2 primary methods of returning data which both have pros and cons.

  1. Retrieve all data in a raw tabular format. No filtering is possible
  2. Retrieve data from specific fields in json format. Filtering is possible

This article will address both methods.

Retrieving data in tabular format

There are endpoints made available for retrieving all data in tabular format. These return csv files that map to our internal representations of your data. They are computationally expensive and download a lot of data, so should be used with caution.

curl --request GET \
--url https://client.getthematic.com/api/survey/SURVEY_IDENTIFIER/data_csv?format=FORMAT&translateThemes=TRANSLATE_THEMES \
--header 'Authorization: bearer ACCESS_TOKEN'<br>

Fields you will need to replace:

  • SURVEY_IDENTIFIER: The identifier for your survey
  • FORMAT: The format of the data you wish to retrieve (see below for options)
  • TRANSLATE_THEMES: Whether to return themes as unique codes, or human readable titles that may change over time
  • ACCESS_TOKEN: Your access token

FORMAT parameter

Thematic currently supports 4 different formats for retrieved data. All of them will be described here.

byResponse

This is the default format and will return data close to the format provided (some translations and modifications may have been done to make it fit an expected format). Themes and other extracted information will be returned as columns appended to the existing data.

Example of data returned byResponse

The format of the file will contain the columns:

  • 1-n. The columns of the original data, after initial cleaning
  • n+1: The themes extracted for this response. See the section 'Understanding the themes data returned'
  • n+2: The specificity extracted for this response
  • n+3: The sentiment extracted for this response

denormalizedResponses 

In this format we use the unique identifier of the row, along with the question column, to create one row per coded response. This can be very useful when pulling data back into a data warehouse and when the file format may change over time.

Example of data returned by denormalizedResponses

The format of the file will contain the columns:

  1. The identifier for the response. This depends on which column is configured for identification in Thematic
  2. The identifier for the question. This will be 'c' followed by the column number
  3. The response
  4. The themes extracted for this response. See the section 'Understanding the themes data returned'
  5. The specificity extracted for this response
  6. The sentiment extracted for this response

noThemes

In this format the data returned will be close to the format provided (some translations and modifications may have been done to make it fit an expected format). No extra data from Thematic will be included.


TRANSLATE_THEMES parameter

Thematic uses unique code identifiers for themes that do not change over time for all internal processing. These codes are human understandable, but are not appropriate for display in visualizations. For display in visualizations we have human readable titles associated with each theme that can be edited in the themes editor. As such they are mutable over time.

The TRANSLATE_THEMES parameter allows for the choice of returning as a unique code or as human readable title. By default this will be as a unique code. 

Translating unique codes to human readable titles

It is possible to do your own, just in time, translation of codes to titles by retrieving the themes associated with the dataset (survey) and using the titles found in this file.

To retrieve the themes that include the titles

curl --url "https://client.getthematic.com/api/survey/SURVEY_IDENTIFIER/data_themes" \
--header 'Authorization: bearer ACCESS_TOKEN'

This will download the 'themes' file which is a json file containing the human-editable portion of the model used to apply themes.

The structure of the themes file that pertains to this article is that there is an entry in the root level object:

  • titles: a string-string mapping of theme codes to theme titles.

Understanding the 'themes' data returned

There are 3 columns appended for every comment column processed

  1. Themes: a json encoded list of data on all themes extracted from the text. The format for each list entry is as follows:
    1. base: the code for the basetheme
    2. sub: the code for the subtheme
    3. data:
      1. loc: an array of character locations from the comment for where the theme can be found
      2. prob: the probability the theme is represented. 1.0 means it matches a known phrase
      3. sent: the sentiment of the theme within the comment between -1 and 1
      4. spec: the measure of specificity of this theme within the comment (how much specific insight can be gleaned from the text)
  2. Specificity: a number between 0 and 1 measuring how much specific insight can be gleaned from the text
  3. Whole Comment Sentiment: a number between -1 and 1 measuring negative/positive sentiment

Retrieving the data in json format

There are two endpoints that can return the data in json format, filtered to particular strings

The relevant endpoints are the following:

https://client.getthematic.com/api/survey/SURVEY_IDENTIFIER/visualization/VISUALIZATION_IDENTIFIER/comments<a href="https://client.getthematic.com/api/survey/SURVEY_IDENTIFIER/visualization/VISUALIZATION_IDENTIFIER/results"><br></a>
https://client.getthematic.com/api/survey/SURVEY_IDENTIFIER/visualization/VISUALIZATION_IDENTIFIER/results

These endpoints both support query parameters to limit the data returned:

  • filter: a filter string to limit the results returned. The available parameters are dependent on the dataset, but a common one would be to filter by date. We use the FIQL format for filters so asking for responses since 1st of May 2020 would look like 
    • date=ge=2015-05-01
  • columns: the comment columns that have been processed which results should be included for

A note on regions

Thematic supports different geographical regions to ensure data sovereignty. When making calls to the Thematic API (or viewing documentation) it is important to look at the correct region and use the correct url. The regions Thematic currently supports are:

It is possible to see which region you are in by looking at the URL while logged into Thematic's Client Portal.