Voice Support Agent
End-to-end voice customer support pipeline. LFM2-Audio-1.5B handles both speech-to-text and text-to-speech in a single model, eliminating separate ASR and TTS services. A specialist LFM2-350M classifier routes customer intent in 15ms. The entire pipeline runs on one GPU with no cloud dependencies.
The Problem
Voice support pipelines chain 3-4 separate models: an ASR service for transcription, an NLU model for intent, a reasoning model for response, and a TTS service for speech output. Each adds latency, cost, and operational complexity.
How LFM Compares
Cloud voice pipelines (Whisper + Dialogflow + GPT-4 + ElevenLabs) deliver quality but add 800ms+ total latency at significant per-interaction cost. On-device keyword spotters are fast but cannot handle open-ended customer queries.
What LFM Unlocks
A unified audio model that handles both speech-to-text and text-to-speech, paired with a specialist 350M intent classifier. Two models replace four services. Faster, cheaper, and simpler to operate.
This demo is fine-tuned on sample data. Results improve with your data.