Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.moderationapi.com/llms.txt

Use this file to discover all available pages before exploring further.

These policies flag content that’s hostile to readers or targeted at protected groups. They run on both sides of a conversation — what users post and what your bots or assistants reply.

Policies

idTypeSupportedWhat it does
toxicityclassifiertext, audioGeneral-purpose toxicity detection: insults, harassment, hostile language.
toxicity_severeclassifiertext, audioA stricter sub-classifier for severe toxicity. Useful when you want to distinguish “rude” from “abusive.”
hateclassifiertext, image, video, audioHate speech, discrimination, racism, and extremism — including image and video content.

Reading the result

const toxicity = response.policies.find(p => p.id === "toxicity");

if (toxicity?.flagged) {
  console.log(`Toxicity ${toxicity.probability * 100}% confidence`);
}

const severe = toxicity?.labels?.find(l => l.id === "severe");
if (severe?.flagged) {
  // Treat severe toxicity differently — e.g. auto-reject instead of review.
}
See Understanding API responses for the full response shape.