Analyze audio

How it works

Audio files are automatically transcribed to text using speech recognition, then the transcript is analyzed by all enabled text-based policies. This means any policy that works on text (toxicity, hate, PII, wordlists, guidelines, etc.) also works on audio with zero additional configuration.

const result = await moderationApi.content.submit({
  content: {
    type: "audio",
    url: "https://example.com/audio.mp3",
  },
});

Supported audio formats

Any format FFmpeg can decode is supported. All audio is internally converted to 16 kHz mono WAV before transcription.

Format	Extensions
MP3	`.mp3`
WAV	`.wav`
AAC	`.aac`, `.m4a`
OGG	`.ogg`, `.oga`
Opus	`.opus`
FLAC	`.flac`
WebM	`.webm`
AMR	`.amr`
WMA	`.wma`
MP4	`.mp4`, `.m4a`, `.mov`

Limits

Constraint	Value
Max file size	50 MB
Max audio duration	10 minutes
Processing timeout	30 seconds
URL schemes	`http`, `https` only
Private/internal IPs	Blocked (SSRF protection)

Transcription quality

You can configure transcription quality per channel in the dashboard under Content > Audio > Transcription quality.

Setting	Label	Use case	Relative speed
SPEED (default)	Fast	Real-time moderation, high volume	Fastest
BALANCED	Balanced	General purpose, good accuracy	~2x slower
ACCURACY	Accurate	Noisy audio, critical content review	~3x slower

Usage and billing

Each audio moderation request costs 2 units: 1 for transcription + 1 for policy analysis.

Documentation

Learn

Resources

How it works

Supported audio formats

Limits

Transcription quality

Usage and billing

​How it works

​Supported audio formats

​Limits

​Transcription quality

​Usage and billing

How it works

Supported audio formats

Limits

Transcription quality

Usage and billing