These policies flag content that’s hostile to readers or targeted at protected groups. They run on both sides of a conversation — what users post and what your bots or assistants reply.Documentation Index
Fetch the complete documentation index at: https://docs.moderationapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Policies
id | Type | Supported | What it does |
|---|---|---|---|
toxicity | classifier | text, audio | General-purpose toxicity detection: insults, harassment, hostile language. |
toxicity_severe | classifier | text, audio | A stricter sub-classifier for severe toxicity. Useful when you want to distinguish “rude” from “abusive.” |
hate | classifier | text, image, video, audio | Hate speech, discrimination, racism, and extremism — including image and video content. |