Toxicity model

Model key: toxicity

Works on the whole text to detect general features like profanity, swearing, racism, threats etc. Contrary to our profanity filter the toxicity analyzer will detect cases where the profanity is not as pronounced.

Toxicity Response Example:

{
  "label": "TOXICITY",
  "score": 0.9563754,
  "label_scores": {
    "TOXICITY": 0.9563754,
    "PROFANITY": 0.89909166,
    "INSULT": 0.68668073,
    "SEVERE_TOXICITY": 0.45895407,
    "DISCRIMINATION": 0.27914262,
    "THREAT": 0.09009721,
    "NEUTRAL": 0.0436246
  }
}

Labels

Label	Description
TOXICITY	The general toxicity. If any other labels have a high score, this one is likely to score high as well.
PROFANITY	Containing swearing, curse words, and other obscene language.
DISCRIMINATION	Racism and other discrimination based on race, religion, gender, etc.
INSULT	Insulting, inflammatory, or negative language.
SEVERE_TOXICITY	Very toxic, containing severe profanity, racism, etc.
THREAT	Threatening, bullying, or aggressive language.
NEUTRAL	Nothing toxic was detected.

Supported languages

This model works in the following languages:

English en
Spanish es
French fr
German de
Italian it
Portuguese pt
Russian ru
Japanese ja
Indonesian id
Chinese zh
Dutch nl
Polish pl
Swedish sv

The model might work on other launguages we haven't tested. Feel free to try it on launguages that are not listed above and provide us with feedback.

Limitations

This model does not have any API limitations.

Documentation

Learn

Guides

Resources

Labels

Supported languages

Limitations

Documentation

Learn

Guides

Resources

​Labels

​Supported languages

​Limitations

Labels

Supported languages

Limitations