Skip to main content
Every LITE Mode session follows three phases: starting, managing, and ending.

1. Starting a Session

  1. Generate a session token configured for LITE mode on your backend
  2. Start the session using that token
  3. The avatar streams into the specified WebRTC room after initialization
Your backend should manage both token generation and session start. Pass the WebRTC credentials to your frontend.

2. Managing the Session

LITE Mode provides a WebSocket connection for controlling the avatar. The typical flow:
  1. User speaks — audio is sent to the room
  2. Your agent processes — your STT/LLM/TTS pipeline handles the input
  3. Agent constructs response audio — your TTS generates the speech
  4. Agent streams audio via WebSocket — send audio chunks to LiveAvatar
  5. LiveAvatar renders video — avatar video frames are sent to the room
Your “agent” can be anything from a simple backend service to a complex multi-model pipeline.

WebSocket commands

Through the WebSocket, you can:
  • Command the avatar to speak (by sending audio)
  • Interrupt avatar responses
  • Modify avatar poses (listening, idle)
  • Keep sessions alive

3. Ending the Session

When a session ends:
  1. The avatar is removed from the LiveKit room
  2. The room is torn down (if created by LiveAvatar)
  3. The WebSocket connection closes
You are responsible for cleaning up your own WebSocket resources and LiveKit data.
Ensure the session token is explicitly set to "LITE" mode. The token configuration determines which mode initializes.