Skip to main content
These policies flag content that’s hostile to readers or targeted at protected groups. They run on both sides of a conversation — what users post and what your bots or assistants reply.

Policies

idTypeSupportedWhat it does
toxicityclassifiertext, audioGeneral-purpose toxicity detection: insults, harassment, hostile language.
toxicity_severeclassifiertext, audioA stricter sub-classifier for severe toxicity. Useful when you want to distinguish “rude” from “abusive.”
hateclassifiertext, image, video, audioHate speech, discrimination, racism, and extremism — including image and video content.

Reading the result

const toxicity = response.policies.find(p => p.id === "toxicity");

if (toxicity?.flagged) {
  console.log(`Toxicity ${toxicity.probability * 100}% confidence`);
}

const severe = toxicity?.labels?.find(l => l.id === "severe");
if (severe?.flagged) {
  // Treat severe toxicity differently — e.g. auto-reject instead of review.
}
See Understanding API responses for the full response shape.