policies array with one entry per enabled policy.
Policy types
Every policy returns atype that tells you how to read its result:
| Type | What it does | Example fields |
|---|---|---|
classifier | Scores content against one or more labels and returns a probability | probability, labels[] |
entity_matcher | Extracts specific entities (URLs, emails, phone numbers, etc.) from content | matches[], signals |
Available policies
Guidelines
Define custom rules in natural language. Each guideline is evaluated by an LLM and returned as its own policy.
Privacy
Personal information detection, intent to share contact details, and PII masking.
Toxicity & Hate
Toxicity, severe toxicity, and hate-based content including discrimination and extremism.
NSFW
Sexual content, flirtation, profanity, violence, and self-harm across text, image, video, and audio.
Illicit & Regulated
Drugs, alcohol, firearms, gambling, adult products, cannabis, crypto, and other regulated categories.
Spam & Security
Spam, self-promotion, code abuse, phishing, and URL extraction.
URL Risk
Real-time risk scoring for URLs — phishing, malware, brand impersonation, credential harvesting.
Topics
Political and religious content for platforms that want to keep discussions on-topic.
Insights like
sentiment and language aren’t policies — they’re returned in the separate insights array and don’t affect flagging. See Understanding API responses.