Data Cleaning and PII Redaction

We can now pre-process your incoming data to clean and mask certain types of sensitive data on our platform.


Note: While the goal of this service is to identify and remove sensitive data, we may not be able to remove all instances. We recommend that you still review your data to satisfy that the redaction/masking meets your needs.


Supported options

Here’s a list for data types that we currently support for cleaning and masking:


option

mask token

description

email

[EMAIL]

masks email addresses

url

[URL]

masks urls

uuid

[UUID]

masks uuids which are sequences of hexadecimal digits in the format of

XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

encoded_data

[ENCODED_DATA]

masks sequences of 40 or more characters without interruption by a space or newline character

address

[ADDRESS]

masks physical addresses

name

[NAME]

masks person names

number

[NUMBER]

masks phone numbers and similar

sequences of numeric characters and some punctuation

email_history Removes any email history by looking for a line with the pattern “On … wrote:”, removing this line and all lines below it

Example

When successfully redacted, here’s an example of how it would look on the Thematic platform:


Input:

Please reply to me at firstname+lastname@example.com my name is John Smith.

Output:

Please reply to me at [EMAIL] my name is [NAME].

Pricing

We have pricing available, depending on the volume and data type/s you want to redact. Reach out to your Customer Success Manager to know more.