AI Agents are one of the easiest ways to customize your moderation policies. You can create an agent in minutes, and use it to moderate your content in your projects.

To create an AI agent you simply explain what your content policies are, and the AI agent will classify content based on your instructions.

How it works

1

Define purpose

Give your agent a name, describe its job, and choose which LLM to use.

2

Add your rules

Add content policies that the agent should follow. You can select from a list of pre-defined rules, or create your own.

3

Use your AI agent

Add the agent to classify your content in your projects.

Need help getting started?

If you need help creating AI agents please send us a message any time. We’re happy to help you set up your first agent.

Get started

Head over to the Model Studio in your dashboard and press the “Add new agent” button.

Here you’ll give your agent a name and describe what it’s job is. These fields are not used for moderation, but are useful for you to identify the agent.

Next you’ll need to choose which LLM model should power your agent. We recommend using Llama Guard 3 because it’s optimized for moderation, but you can also use GPT-4o-mini.

Choose between Llama guard and GPT

We currently offer two LLM models for AI agents:

Llama Guard 3

LLM model that is optimized for moderation. Recommended for most use cases.

GPT-4o-mini

OpenAI’s latest model. Recommended if you need to moderate something that is not covered by Llama Guard 3.

Note, if you would like to use a different LLM model, please send us a message and we’ll see what we can do.

Benefits of Llama guard compared to GPT-4o-mini:

  • Faster for real-time applications
  • More accurate within the 14 safety categories
  • Hosted on GDPR compliant GPU servers in Germany
  • Can be on-premise for enterprise customers
  • Can be fine-tuned for enterprise customers
  • Returns probability scores instead of binary flags

When to use GPT-4o-mini over Llama guard:

  • For custom safety categories that fall far outside the scope of the MLCommons safety categories
  • If you prefer false positives over false negatives
  • When speed is not crucial

You can read more about Llama Guard 3 here and GPT-4o-mini here.

Was this page helpful?