Flagging thresholds
Learn how to configure flagging thresholds for your models.
Moderation API will flag content based on the models you’ve added to your project. Flagging happens When all the non-neutral labels of a model have a score below the flagging threshold.
The default threshold is 0.5 (50% confidence), but you can configure the threshold for each label in your project.
Changing the thresholds
By changing the flagging threshold, you can control how strict the model is. For example, if you want to accept more content, you can increase the threshold to 0.75. This means that the model will only flag content if it is 75% confident about a non-neutral label.
On the other hand, if you want to be more strict, you can lower the threshold to 0.25. This means that the model will flag content even if it’s only 25% confident about a non-neutral label.
Please note that changing the flagging threshold does not affect the score or
output of the individual models. It only affects the flagged
field of the
API response.
Thresholds can be configured independently for each model and labels. So you can be more strict with some types of content and more loose with others.
If any of the models exceeds their respective threshold, the flagged
field will be set to true
.
What is the flagged field used for?
When using review queues, you can filter the content based on the flagged
field. By default only flagged content enters the queue.
We also recommend to use the flagged
field in your application code to make decisions about the content. That way, you can change the flagging threshold without having to change your application code.
Flagging for recognizer models
Some models do not return any scores, but a list of matches. For example, the profanity model returns a list of swear words that were found in the text. In this case, you the item is flagged if the list contains at least one item.
Was this page helpful?