Moderation API flags content based on the models you’ve added to your project. A piece of content is flagged when all the non-neutral labels of a model have a score above the configured flagging threshold.

By default, the threshold is set to 0.5 (i.e., 50% confidence). However, you can configure a separate threshold for each label in your project to fit your specific needs. Setting a higher threshold makes the model less likely to flag content, while lowering it makes it more sensitive to potential issues.

Changing the thresholds

Choosing different flagging thresholds allows you to control how strict the moderation should be for various types of content.

• If you’d like to be more lenient, you can increase the threshold (e.g., to 0.75). The model will only flag content if it is at least 75% confident that a non-neutral label applies.
• To be more cautious, you can lower the threshold (e.g., to 0.25). Even a 25% confidence level for a non-neutral label would be enough for the content to be flagged.


Adjusting the flagging threshold won’t change the actual confidence scores given by the model—they remain the same. Instead, altering the threshold only impacts whether the final flagged field in the API response is set to true or false.

Thresholds can be independently configured for each model and label. This flexibility allows you to be more cautious with certain subject matter while remaining lenient with others. If any model involved in the analysis has a non-neutral label that exceeds its respective threshold, the flagged field will be set to true.

What is the flagged field used for?

When using review queues, the flagged field is often used to filter content and prioritize items that may require your attention or moderation. By default, only flagged content enters the review queue. You can also use the flagged field in your application code to make immediate decisions about how to handle content (e.g., pushing it for review or blocking it outright). This mechanism enables you to adjust thresholds on the fly without making extensive code changes.

Flagging for entity matchers

Some models, such as profanity detectors, or email matchers, return a list of flagged terms rather than probability scores. In these cases, the item is flagged if the model finds any matched terms. This means no explicit threshold configuration is necessary for certain types of “recognizer” models—they simply flag content when certain triggers are detected.

Was this page helpful?