Skip to main content
Real-time voice moderation stream (WebSocket)
curl --request GET \
  --url wss://voice.moderationapi.com/v1/stream \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --header 'Sec-WebSocket-Protocol: <sec-websocket-protocol>' \
  --data '
{
  "event": "start",
  "streamSid": "<string>",
  "mediaFormat": {
    "encoding": "audio/x-mulaw",
    "sampleRate": 8000
  },
  "tracks": [
    {
      "authorId": "<string>"
    }
  ],
  "conversationId": "<string>",
  "channel": "<string>",
  "emitPartials": false,
  "metadata": {}
}
'
{
  "v": 1,
  "event": "session.started",
  "sessionId": "<string>",
  "tracks": [
    "<string>"
  ],
  "conversationId": "<string>"
}
Moderate live voice and call audio over a WebSocket. You send a start frame, stream media frames as audio arrives, then stop; the server transcribes speech and returns a moderation verdict for each finalized utterance. For the full walkthrough and code examples, see Real-time voice.

Headers

Authorization
string
required

Bearer <api_key>

Sec-WebSocket-Protocol
enum<string>
required

Requested subprotocol.

Available options:
moderationapi.v1

Body

application/json

Frames sent by the client over the socket (not an HTTP body).

First frame the client sends. Declares the conversation, audio format, and tracks.

event
enum<string>
required
Available options:
start
streamSid
string
required

Your identifier for this stream.

mediaFormat
object
required
tracks
object[]
required

One or both tracks. Stream only the track(s) you have.

conversationId
string

Your external conversation id. Omit to have one generated and returned in session.started.

channel
string

Optional. Selects which channel's policy configuration applies.

emitPartials
boolean
default:false

Set true to also receive interim, non-final transcripts.

metadata
object

Arbitrary JSON attached to the conversation. Stored as-is and not interpreted by moderation.

Response

101 - application/json

Switching Protocols. The server then streams event frames over the socket; the key one is utterance.final.

Sent after the start frame is accepted.

v
enum<integer>
required
Available options:
1
event
enum<string>
required
Available options:
session.started
sessionId
string
required
tracks
string[]
required
conversationId
string