Skip to main content
The moderation API returns a structured response that helps you decide how to handle content. Here’s an example response:
Example response
{
  "content": {
    "id": "message-123",
    "masked": true,
    "modified": "This is a test, my email is {{ email hidden }}"
  },
  "author": {
    "id": "auth_abc123",
    "external_id": "user-123",
    "status": "enabled",
    "block": null,
    "trust_level": {
      "level": 2,
      "manual": false
    }
  },
  "evaluation": {
    "flagged": true,
    "flag_probability": 0.87,
    "severity_score": 0.3,
    "unicode_spoofed": false
  },
  "recommendation": {
    "action": "review",
    "reason_codes": ["severity_review"]
  },
  "policies": [
    {
      "id": "personal_information",
      "type": "entity_matcher",
      "probability": 0.95,
      "flagged": true,
      "flagged_fields": [],
      "matches": [
        {
          "probability": 0.95,
          "match": "[email protected]",
          "span": [28, 44]
        }
      ]
    },
    {
      "id": "toxicity",
      "type": "classifier",
      "probability": 0.12,
      "flagged": false,
      "labels": [
        {
          "id": "toxic",
          "probability": 0.12,
          "flagged": false
        }
      ]
    }
  ],
  "insights": [
    {
      "id": "sentiment",
      "type": "insight",
      "probability": 0.78,
      "value": "neutral"
    },
    {
      "id": "language",
      "type": "insight",
      "probability": 0.99,
      "value": "en"
    }
  ],
  "meta": {
    "status": "success",
    "timestamp": 1735902168566,
    "channel_key": "default",
    "usage": 1,
    "processing_time": "245ms"
  }
}
How you act on the response is up to you:
  • Block the content and return an error message to your user if it gets flagged
  • Store it to your database and let a human review it using review queues
  • Do something in between and only review content when the AI is not confident in its decision

Use the recommendation

The easiest way to handle moderation responses is to use the recommendation object. It provides a clear action based on your channel configuration, severity scores, and author status.
switch (response.recommendation.action) {
  case "reject":
    // Block the content, show an error to the user
    throw new Error("Content not allowed");
  case "review":
    // Save the content but flag for manual review
    await saveContent(content, { needsReview: true });
    break;
  case "allow":
    // Content is approved, proceed normally
    await saveContent(content);
    break;
}
The reason_codes array tells you why a particular recommendation was made:
Reason codeDescription
severity_rejectContent severity score exceeded rejection threshold
severity_reviewContent severity score exceeded review threshold
author_blockThe author is blocked or suspended
dry_runDry run mode is enabled (always returns allow)

Check if flagged

For simple use cases, you can check the evaluation.flagged field. This boolean indicates if any of your enabled policies detected something that triggered a flag.
if (response.evaluation.flagged) {
  // Content was flagged by at least one policy
}

Severity score

The severity_score gives you more granular control. Higher scores indicate more severe violations:
if (response.evaluation.flagged && response.evaluation.severity_score > 0.7) {
  // High severity - reject immediately
} else if (response.evaluation.flagged) {
  // Lower severity - send to review queue
}
We recommend using the recommendation.action field instead of implementing your own threshold logic. Configure thresholds in your channel settings for easier management.

Work with individual policies

The policies array contains results from each policy enabled in your channel, sorted by highest probability. Each policy includes:
FieldDescription
idThe policy identifier (e.g., toxicity, personal_information)
typeEither classifier or entity_matcher
probabilityModel confidence level (0-1)
flaggedWhether this policy triggered based on your thresholds
flagged_fieldsFor object submissions, which fields triggered the flag

Classifier policies

Classifier policies (like toxicity, spam, hate) analyze content and return a probability score:
const toxicityPolicy = response.policies.find(p => p.id === "toxicity");

if (toxicityPolicy?.flagged) {
  console.log(`Toxicity detected with ${toxicityPolicy.probability * 100}% confidence`);
}

// Check specific labels within a classifier
if (toxicityPolicy?.labels) {
  const severeLabel = toxicityPolicy.labels.find(l => l.id === "severe");
  if (severeLabel?.flagged) {
    // Handle severe toxicity differently
  }
}

Entity matcher policies

Entity matcher policies (like personal_information, url) detect and extract specific entities:
const piiPolicy = response.policies.find(p => p.id === "personal_information");

if (piiPolicy?.matches?.length > 0) {
  console.log("Found PII:");
  piiPolicy.matches.forEach(match => {
    console.log(`  - "${match.match}" at position ${match.span[0]}-${match.span[1]}`);
  });
}

Handle masked content

If you have PII masking enabled, the API can automatically redact sensitive information. Check the content object:
if (response.content.masked) {
  // Use the modified content with masked values
  const safeContent = response.content.modified;
  await saveToDatabase(safeContent);
} else {
  // No masking applied, use original content
  await saveToDatabase(originalContent);
}
This is useful for:
  • Anonymizing content before storing in your database
  • Preventing users from seeing personal information
  • Compliance with data protection regulations

Check author status

If you’re using author management, the response includes author information:
const { author } = response;

if (author?.status === "blocked") {
  // Author is permanently blocked
  throw new Error("Your account has been blocked");
}

if (author?.status === "suspended") {
  const until = new Date(author.block.until);
  throw new Error(`Your account is suspended until ${until.toLocaleDateString()}`);
}

// Optionally adjust behavior based on trust level
if (author?.trust_level.level >= 3) {
  // Trusted author - maybe skip certain checks
}

Use insights

The insights array provides additional analysis that doesn’t affect flagging:
const sentimentInsight = response.insights.find(i => i.id === "sentiment");
const languageInsight = response.insights.find(i => i.id === "language");

console.log(`Language: ${languageInsight?.value}`); // e.g., "en"
console.log(`Sentiment: ${sentimentInsight?.value}`); // "positive", "neutral", or "negative"

// Route negative sentiment to priority review
if (sentimentInsight?.value === "negative" && response.evaluation.flagged) {
  await addToPriorityQueue(content);
}

Detect unicode spoofing

Spammers sometimes use look-alike characters to bypass moderation (e.g., mоney with a Cyrillic “о” instead of Latin “o”). The API detects and normalizes these characters. Check the unicode_spoofed field:
if (response.evaluation.unicode_spoofed) {
  console.log("Content contains look-alike characters");
  // The policies ran on normalized text for accurate detection
}

Handle errors

Check meta.status for the overall request status:
if (response.meta.status === "partial_success") {
  // Some policies failed - check the errors array
  response.errors?.forEach(error => {
    console.warn(`Policy ${error.id} failed: ${error.message}`);
  });
}
The errors array contains details about any policies that encountered issues during processing.