Documentation Index
Fetch the complete documentation index at: https://docs.moderationapi.com/llms.txt
Use this file to discover all available pages before exploring further.
False positives and negatives
When you’re relying on a content moderation system, a key concept to understand is the trade-off between false positives and false negatives:- False positives occur when a system flags content as violating policy when it is actually compliant.
- False negatives occur when a system fails to flag content that does violate policy.
- Adjust thresholds (covered below) to tightly align with your standards and risk appetite.
- Supplement automated checks with human moderation for borderline cases to reduce both types of errors.
Adjusting flagging thresholds
The Moderation API allows configuration for how strictly it flags content. By experimenting with threshold scores, you can tweak the sensitivity:- Lowering the threshold: This will reduce false negatives (because more content is flagged), but potentially increase false positives.
- Raising the threshold: This will reduce false positives (because the system flags fewer pieces of content), but potentially increase false negatives.
Adding or removing models
Your workflow or platform could be improved by adding specialized models or removing unnecessary ones:- Adding specialized models: If your use case deals with a specific domain (e.g., medical or legal), you might consider training or deploying a model tuned for that domain. This model could work in tandem with the general Moderation API to reduce domain-specific false flags.
- Removing unneeded models: If your current pipeline includes multiple checks from overlapping or redundant models, rationalizing them may reduce complexity and potential conflicts in results. It can also streamline moderation decisions.
Writing custom guidelines
When pre-built policies don’t quite fit your platform, Guidelines let you describe what isn’t allowed in plain English. Each guideline is evaluated by an LLM with your project context in scope, so the rule is interpreted against what your platform is and who uses it. Guidelines work well for:- Platform-specific rules that aren’t covered by the standard categories.
- Cases where intent matters more than the literal words used.
- Iterating quickly — you can change a guideline by editing one sentence rather than retraining a model.
Enabling context awareness
Many moderation challenges arise when the system doesn’t understand the broader context behind the text:- Certain terms might be acceptable in an educational or reclaiming context (e.g., quoting a slur to explain its original meaning).
- Cultural or community-specific language usage might not translate well with a general-purpose model.
- Provide additional metadata or preceding conversation snippets along with your text, to give the Moderation API or other classifiers a better understanding of what’s being said.
- Enable Context Awareness in your project settings and include
authorIdandconversationIdin your requests so the system can reference previous messages. See Submitting content to Moderation API for more details.
Training custom models
If you find that standard models are not meeting your performance requirements, consider building a custom fine-tuned model. This can help if:- You have a substantial dataset specific to your industry or type of content.
- You need higher precision for borderline or ambiguous cases.
- You want to reduce reliance on manual moderation for specialized content.
Get help from our team
If you need help deciding how to set up moderation, fine-tuning or advanced configurations, reach out to our support teams. We can help you:- Determine the right thresholds for your workflow.
- Explore sample code to integrate moderation into your application.
- Identify potential pitfalls with domain-specific moderation.