Entity matchers overview
Entity matchers are used to find various data in text. For example, you can use them to find dates, numbers, emails, and so on. This enables you to:
- Detect if a text contains certain data
- Extract structured data from unstructured text
- Mask out swear words, phone numbers, and other sensitive data
- Pseudonymize data for GDPR compliance
Masking
You can use entity matchers to mask out certain words or phrases. For example, you can mask out swear words, phone numbers, or other sensitive data. You can enable masking and set the replacement value when you add a matcher to your project.
When masking is enabled the content
field of the text moderation response will the modified text, the original
field will contain the original text. The content_moderated
field will indicate whether the content differs from the original text.
Detection levels
For most matchers you can set the detection level. This determines how strict the matcher should be.
Level | Description |
---|---|
NORMAL | Detect values that are spelled and formatted correctly. |
SUSPICIOUS | Also detect values that are mispelled or obfuscated. |
PARANOID | Also detect values even if somewhat unsure about correctness. |
Response signature
All matcher models have the same response signature:
The detection level used for the match.
Whether the matcher found a match.
An array of all the matches found.
An array of objects with the components of each match. For example, for a name matcher, the components would be the first and last name.