Data Cleaning and PII Redaction
We can now pre-process your incoming data to clean and mask certain types of sensitive data on our platform.
Note: While the goal of this service is to identify and remove sensitive data, we may not be able to remove all instances. We recommend that you still review your data to satisfy that the redaction/masking meets your needs.
Supported options
Here’s a list for data types that we currently support for cleaning and masking:
option mask token |
description |
[EMAIL] |
masks email addresses |
url [URL] |
masks urls |
uuid [UUID] |
masks uuids which are sequences of hexadecimal digits in the format of XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
encoded_data [ENCODED_DATA] |
masks sequences of 40 or more characters without interruption by a space or newline character |
address [ADDRESS] |
masks physical addresses |
name [NAME] |
masks person names |
number [NUMBER] |
masks phone numbers and similar sequences of numeric characters and some punctuation |
email_history | Removes any email history by looking for a line with the pattern “On … wrote:”, removing this line and all lines below it |
Example
When successfully redacted, here’s an example of how it would look on the Thematic platform:
Input:
Please reply to me at firstname+lastname@example.com my name is John Smith.
Output:
Please reply to me at [EMAIL] my name is [NAME].
Pricing
We have pricing available, depending on the volume and data type/s you want to redact. Reach out to your Customer Success Manager to know more.