“The underlying model that brings Grok to life is a voice-to-voice model which understands the expressive range of human speech... The model is able to do this because of how it internally, within a single model, processes speech (including paralinguistic cues) and generates expressive speech output.” https://blog.livekit.io/xai-livekit-partnership-grok-voice-agent-api/