To analyze text, you need to send a POST request with the text to the text moderation endpoint and you will receive a response with the results right away.

Example

Response example
{
  "status": "success",
  "request": {
    "timestamp": 1612792574690,
    "quota_usage": 1
  },
  "content_moderated": true,
  "data_found": true,
  "flagged": true,
  "original": "You can contact me on mr_robot[at]gmail|DOT|com or call me on 12 34 65 78",
  "content": "You can contact me on {{ email hidden }} or call me on {{ number hidden }}",
  "email": {
    "found": true,
    "mode": "SUSPICIOUS",
    "matches": ["mr_robot[at]gmail|DOT|com"]
  },
  "phone": {
    "found": true,
    "mode": "NORMAL",
    "matches": ["12 34 65 78"]
  },
  "sentiment": {
    "label": "NEUTRAL",
    "labelIndex": 0,
    "score": 0.997857,
    "label_scores": {
      "NEUTRAL": 0.997857,
      "POSITIVE": 0.001105,
      "NEGATIVE": 0.000666
    }
  }
}

Acting on the response

Here are some examples on how to use the response in your application.

Utilize flagged field

The easiest way to get started is to use the flagged field. The flagged field is a boolean that indicates if any of the models detected something non-neutral. You can adjust the thresholds for flagging content in the project settings. See flagging thresholds.

if (data.flagged) {
  // Do something
}

Utilize model scores

You can also utilize the model scores to have more granular control over the content moderation. For example, if you want to flag content that has a negative sentiment score of more than 0.9, you can do the following:

if (data.flagged && data.sentiment.label_scores.NEGATIVE > 0.9) {
  // Do something
}

Saving matched entities

You can also save the matched entities to your database. For example, if you want to save the email address that was detected in the content, you can do the following:

if (data.email.mathces.length > 0) {
  // Save data.email.matches[0] to your database
}

Detecting look-alike characters

Spammers might try to get around your moderation by spoofing text with look-alike characters. For example, they might write 🅼🅾🅽🅴🆈 instead of money, or it can be an even more subtle replacement like hidden spaces or very similar looking letters.

Moderation API detects and normalizes look-alike characters before analyzing content. You can find the normalized text in the content field of the response, and the original text in the original field.

Additionally you can use the unicode_spoofing field to see if look-alike characters were detected.

Lastly, models like the spam model are trained to take look-alike characters into account. Specifically the spam model should raise a flag for excessive use of look-alike characters. Other models like the email model will run on the normalized text to improve accuracy.

Storing modified content

Some models can be configured to modify the content. For example, the email model can mask email addresses with {{ email hidden }}. You can store the modified content in your database by using the content field.

if (data.content_moderated) {
  // Store data.content in your database
}

This can be useful if you need to anonymize/pseudonymize the content before storing it in your database, or if you do not wish your end users to see personal information.

Adding data to the request

You can add metadata to the content you are sending for moderation. Some specific metadata is used by the moderation pipeline and is documented below.

ContextId

The text moderation endpoint allows for contextId in the request body.

This could be the ID of a chatroom or a post thread.

If you are using review queues the contextId can also be used to filter the review queue to only show content from a specific context.

Enable context awareness to improve the accuracy of your moderation using the contextId.

AuthorId

The text moderation endpoint allows for authorId in the request body.

This would be the ID of the user who wrote the content.

This is useful if you are using review queues, as it enables you to perform user level moderation, and to filter the review queue to only show content from a specific user.

Enable context awareness to improve the accuracy of your moderation using the authorId.

Other metadata

The text moderation endpoint allows for a metadata object in the request body.

This can be used to add any additional information to the request. For example you can keep a reference ID to the original content in the metadata object.

Metadata is shown in the review queues and included in any webhooks.

If you add a link it will be clickable from the review queue. Useful if you link to the original content in the metadata, you can easily navigate to the original content from the review queues.

Context awareness

Enable Context awareness in your project settings, and then include authorId and/or contextId in the API request. This enables models to pull in previous messages for their analysis and increase their accuracy.

Context awareness can increase quota usage as it requires additional processing and analysis of previous messages. Each contextual message analyzed counts towards your quota usage. Consider this when implementing context awareness in your project, especially for high-volume applications.

LLM models like AI agents can use the contextId to see the previous messages in the same context, and authorId to see the previous messages from the same author.

Specifically this can prevent unwanted content that are spread of multiple messages.

msg 1 -> f
msg 2 -> u
msg 3 -> c
msg 4 -> k [FLAGGED with context awareness]

It can also help understanding messages in the context of a conversation.

user 1 -> What's the worst thing you know?
user 2 -> European people [FLAGGED with context awareness]

Opt out of content store

The text moderation endpoint allows for a doNotStore boolean in the request body.

If you set doNotStore to true, the content will not be stored and only pass through the moderation models.

If you set doNotStore to true, it will make parts of the moderation dashboards less useful.

Do not disable content store if you wish to train or optimize models based on your own data.

Was this page helpful?