Real-time voice stream
Open a WebSocket to moderate live voice audio in real time and receive a verdict per spoken utterance.
start frame, stream media frames as audio arrives, then stop; the server transcribes speech and returns a moderation verdict for each finalized utterance.
For the full walkthrough and code examples, see Real-time voice.Headers
Bearer <api_key>
Requested subprotocol.
moderationapi.v1 Body
Frames sent by the client over the socket (not an HTTP body).
- Option 1
- Option 2
- Option 3
First frame the client sends. Declares the conversation, audio format, and tracks.
start Your identifier for this stream.
One or both tracks. Stream only the track(s) you have.
Your external conversation id. Omit to have one generated and returned in session.started.
Optional. Selects which channel's policy configuration applies.
Set true to also receive interim, non-final transcripts.
Arbitrary JSON attached to the conversation. Stored as-is and not interpreted by moderation.
Response
Switching Protocols. The server then streams event frames over the socket; the key one is utterance.final.