Entity matchers overview

Entity matchers are used to find various data in text. For example, you can use them to find dates, numbers, emails, and so on. This enables you to:

Detect if a text contains certain data
Extract structured data from unstructured text
Mask out swear words, phone numbers, and other sensitive data
Pseudonymize data for GDPR compliance

Masking

You can use entity matchers to mask out certain words or phrases. For example, you can mask out swear words, phone numbers, or other sensitive data. You can enable masking and set the replacement value when you add a matcher to your project. When masking is enabled the content field of the text moderation response will the modified text, the original field will contain the original text. The content_moderated field will indicate whether the content differs from the original text.

Response example with masked content

{
  // ...
  "content_moderated": true,
  "data_found": true,
  "flagged": true,
  "original": "You can contact me on mr_robot[at]gmail|DOT|com or call me on 12 34 65 78",
  "content": "You can contact me on {{ email hidden }} or call me on {{ number hidden }}",
  "email": {
    "found": true,
    "mode": "SUSPICIOUS",
    "matches": ["mr_robot[at]gmail|DOT|com"]
  },
  "phone": {
    "found": true,
    "mode": "NORMAL",
    "matches": ["12 34 65 78"]
  }
}

Detection levels

For most matchers you can set the detection level. This determines how strict the matcher should be.

Level	Description
`NORMAL`	Detect values that are spelled and formatted correctly.
`SUSPICIOUS`	Also detect values that are mispelled or obfuscated.
`PARANOID`	Also detect values even if somewhat unsure about correctness.

Response signature

All matcher models have the same response signature:

mode

NORMAL | SUSPICIOUS | PARANOID

required

The detection level used for the match.

found

boolean

required

Whether the matcher found a match.

matches

string[]

required

An array of all the matches found.

components

object[]

An array of objects with the components of each match. For example, for a name matcher, the components would be the first and last name.

Pre-built entity matchers

email

Detects email.

Phone number

Detects phone numbers.

URL

Detects URLs.

Address

Detects addresses.

Name

Detects names.

Username

Detects usernames.

Profanity

Detects profanity.

Word

Detects words from a list.

Sensitive number

Credit card numbers etc.

Documentation

Quickstart

Learn

Resources

Masking

Detection levels

Response signature

Pre-built entity matchers

email

Phone number

URL

Address

Name

Username

Profanity

Word

Sensitive number

Documentation

Quickstart

Learn

Resources

​Masking

​Detection levels

​Response signature

​Pre-built entity matchers

email

Phone number

URL

Address

Name

Username

Profanity

Word

Sensitive number

Masking

Detection levels

Response signature

Pre-built entity matchers