How to create an AI Agent
Learn how to create AI agents for content moderation.
AI Agents are one of the easiest ways to customize your moderation policies. You can create an agent in minutes, and use it to moderate your content in your projects.
To create an AI agent you simply explain what your content policies are, and the AI agent will classify content based on your instructions.
How it works
Define purpose
Give your agent a name, describe its job, and choose which LLM to use.
Add your rules
Add content policies that the agent should follow. You can select from a list of pre-defined rules, or create your own.
Use your AI agent
Add the agent to classify your content in your projects.
Need help getting started?
If you need help creating AI agents please send us a message any time. We’re happy to help you set up your first agent.
Get started
Head over to the Model Studio in your dashboard and press the “Add new agent” button.
Here you’ll give your agent a name and describe what it’s job is. These fields are not used for moderation, but are useful for you to identify the agent.
Next you’ll need to choose which LLM model should power your agent. We recommend using Llama Guard 3 because it’s optimized for moderation, but you can also use GPT-4o-mini.
Choose between Llama guard and GPT
We currently offer two LLM models for AI agents:
Llama Guard 3
LLM model that is optimized for moderation. Recommended for most use cases.
GPT-4o-mini
OpenAI’s latest model. Recommended if you need to moderate something that is not covered by Llama Guard 3.
Note, if you would like to use a different LLM model, please send us a message and we’ll see what we can do.
Benefits of Llama guard compared to GPT-4o-mini:
- Faster for real-time applications
- More accurate within the 14 safety categories
- Hosted on GDPR compliant GPU servers in Germany
- Can be on-premise for enterprise customers
- Can be fine-tuned for enterprise customers
- Returns probability scores instead of binary flags
When to use GPT-4o-mini over Llama guard:
- For custom safety categories that fall far outside the scope of the MLCommons safety categories
- If you prefer false positives over false negatives
- When speed is not crucial
You can read more about Llama Guard 3 here and GPT-4o-mini here.
Was this page helpful?