Skip to main content
FULL Mode provides four customizable layers for your LiveAvatar session.

Avatar (Visual Layer)

Defines what the avatar looks like. Choose from a wide selection of avatars, each with unique styles, appearances, and expressions. Each avatar has a unique avatar_id. Browse available avatars through the List Public Avatars or List User Avatars endpoints.

Voice (Audio Layer)

Defines what the avatar sounds like. Voices vary by gender, age, tone, and accent. The voice_settings parameter enables fine-grained audio control for speed, style, and stability. See Configuring Voice Settings for provider-specific options.

Context (Cognitive Layer)

Defines how the avatar thinks and responds. The context layer controls:
  • Available information and knowledge
  • Response constraints and guardrails
  • Personality traits
  • Opening text (spoken at session start)
  • Instructions for response generation
If no context is supplied, the avatar operates in restricted mode — it will not respond to user input. User transcripts are still emitted, but the avatar can only repeat pre-set phrases.

Interactivity Type (Conversational Layer)

Controls how user input is registered and when the avatar responds.

Conversational (default)

The system manages conversation flow automatically based on speech pauses and user interruptions.

Push-to-Talk

You control exactly when user input is registered by signaling start/stop events. VAD and ASR are disabled until you signal. See Push-to-Talk for details.

Video Settings

These settings apply to both FULL and LITE modes.

Quality

The quality parameter controls output resolution:
ValueResolution
very_high1080p
high (default)720p
medium480p
low360p
Higher resolution increases streaming latency. Use high or medium for most real-time applications.

Encoding

The encoding parameter controls the video codec: VP8 or H264.