The moderation API returns a structured response that helps you decide how to handle content. Here’s an example response:
{
"content": {
"id": "message-123",
"masked": true,
"modified": "This is a test, my email is {{ email hidden }}"
},
"author": {
"id": "auth_abc123",
"external_id": "user-123",
"status": "enabled",
"block": null,
"trust_level": {
"level": 2,
"manual": false
}
},
"evaluation": {
"flagged": true,
"flag_probability": 0.87,
"severity_score": 0.3,
"unicode_spoofed": false
},
"recommendation": {
"action": "review",
"reason_codes": ["severity_review"]
},
"policies": [
{
"id": "personal_information",
"type": "entity_matcher",
"probability": 0.95,
"flagged": true,
"flagged_fields": [],
"matches": [
{
"probability": 0.95,
"match": "[email protected]",
"span": [28, 44]
}
]
},
{
"id": "toxicity",
"type": "classifier",
"probability": 0.12,
"flagged": false,
"labels": [
{
"id": "toxic",
"probability": 0.12,
"flagged": false
}
]
}
],
"insights": [
{
"id": "sentiment",
"type": "insight",
"probability": 0.78,
"value": "neutral"
},
{
"id": "language",
"type": "insight",
"probability": 0.99,
"value": "en"
}
],
"meta": {
"status": "success",
"timestamp": 1735902168566,
"channel_key": "default",
"usage": 1,
"processing_time": "245ms"
}
}
How you act on the response is up to you:
- Block the content and return an error message to your user if it gets flagged
- Store it to your database and let a human review it using review queues
- Do something in between and only review content when the AI is not confident in its decision
Use the recommendation
The easiest way to handle moderation responses is to use the recommendation object. It provides a clear action based on your channel configuration, severity scores, and author status.
switch (response.recommendation.action) {
case "reject":
// Block the content, show an error to the user
throw new Error("Content not allowed");
case "review":
// Save the content but flag for manual review
await saveContent(content, { needsReview: true });
break;
case "allow":
// Content is approved, proceed normally
await saveContent(content);
break;
}
The reason_codes array tells you why a particular recommendation was made:
| Reason code | Description |
|---|
severity_reject | Content severity score exceeded rejection threshold |
severity_review | Content severity score exceeded review threshold |
author_block | The author is blocked or suspended |
dry_run | Dry run mode is enabled (always returns allow) |
Check if flagged
For simple use cases, you can check the evaluation.flagged field. This boolean indicates if any of your enabled policies detected something that triggered a flag.
if (response.evaluation.flagged) {
// Content was flagged by at least one policy
}
Severity score
The severity_score gives you more granular control. Higher scores indicate more severe violations:
if (response.evaluation.flagged && response.evaluation.severity_score > 0.7) {
// High severity - reject immediately
} else if (response.evaluation.flagged) {
// Lower severity - send to review queue
}
We recommend using the recommendation.action field instead of implementing your own threshold logic. Configure thresholds in your channel settings for easier management.
Work with individual policies
The policies array contains results from each policy enabled in your channel, sorted by highest probability. Each policy includes:
| Field | Description |
|---|
id | The policy identifier (e.g., toxicity, personal_information) |
type | Either classifier or entity_matcher |
probability | Model confidence level (0-1) |
flagged | Whether this policy triggered based on your thresholds |
flagged_fields | For object submissions, which fields triggered the flag |
Classifier policies
Classifier policies (like toxicity, spam, hate) analyze content and return a probability score:
const toxicityPolicy = response.policies.find(p => p.id === "toxicity");
if (toxicityPolicy?.flagged) {
console.log(`Toxicity detected with ${toxicityPolicy.probability * 100}% confidence`);
}
// Check specific labels within a classifier
if (toxicityPolicy?.labels) {
const severeLabel = toxicityPolicy.labels.find(l => l.id === "severe");
if (severeLabel?.flagged) {
// Handle severe toxicity differently
}
}
Entity matcher policies
Entity matcher policies (like personal_information, url) detect and extract specific entities:
const piiPolicy = response.policies.find(p => p.id === "personal_information");
if (piiPolicy?.matches?.length > 0) {
console.log("Found PII:");
piiPolicy.matches.forEach(match => {
console.log(` - "${match.match}" at position ${match.span[0]}-${match.span[1]}`);
});
}
Handle masked content
If you have PII masking enabled, the API can automatically redact sensitive information. Check the content object:
if (response.content.masked) {
// Use the modified content with masked values
const safeContent = response.content.modified;
await saveToDatabase(safeContent);
} else {
// No masking applied, use original content
await saveToDatabase(originalContent);
}
This is useful for:
- Anonymizing content before storing in your database
- Preventing users from seeing personal information
- Compliance with data protection regulations
Check author status
If you’re using author management, the response includes author information:
const { author } = response;
if (author?.status === "blocked") {
// Author is permanently blocked
throw new Error("Your account has been blocked");
}
if (author?.status === "suspended") {
const until = new Date(author.block.until);
throw new Error(`Your account is suspended until ${until.toLocaleDateString()}`);
}
// Optionally adjust behavior based on trust level
if (author?.trust_level.level >= 3) {
// Trusted author - maybe skip certain checks
}
Use insights
The insights array provides additional analysis that doesn’t affect flagging:
const sentimentInsight = response.insights.find(i => i.id === "sentiment");
const languageInsight = response.insights.find(i => i.id === "language");
console.log(`Language: ${languageInsight?.value}`); // e.g., "en"
console.log(`Sentiment: ${sentimentInsight?.value}`); // "positive", "neutral", or "negative"
// Route negative sentiment to priority review
if (sentimentInsight?.value === "negative" && response.evaluation.flagged) {
await addToPriorityQueue(content);
}
Detect unicode spoofing
Spammers sometimes use look-alike characters to bypass moderation (e.g., mоney with a Cyrillic “о” instead of Latin “o”).
The API detects and normalizes these characters. Check the unicode_spoofed field:
if (response.evaluation.unicode_spoofed) {
console.log("Content contains look-alike characters");
// The policies ran on normalized text for accurate detection
}
Handle errors
Check meta.status for the overall request status:
if (response.meta.status === "partial_success") {
// Some policies failed - check the errors array
response.errors?.forEach(error => {
console.warn(`Policy ${error.id} failed: ${error.message}`);
});
}
The errors array contains details about any policies that encountered issues during processing.