📞

Voice Support Agent

End-to-end voice customer support pipeline. LFM2-Audio-1.5B handles both speech-to-text and text-to-speech in a single model, eliminating separate ASR and TTS services. A specialist LFM2-350M classifier routes customer intent in 15ms. The entire pipeline runs on one GPU with no cloud dependencies.

Unified STT + TTS — LFM2-Audio-1.5B handles both speech recognition and speech synthesis. No separate Whisper and TTS services to manage
15ms intent classification — Specialist LFM2-350M routes customer intent to the right team before the audio model finishes speaking
Two models, one GPU — The entire voice pipeline runs on a single GPU. No cloud STT/TTS APIs, no per-minute pricing, no data leaving your infrastructure

The Problem

Voice support pipelines chain 3-4 separate models: an ASR service for transcription, an NLU model for intent, a reasoning model for response, and a TTS service for speech output. Each adds latency, cost, and operational complexity.

How LFM Compares

Cloud voice pipelines (Whisper + Dialogflow + GPT-4 + ElevenLabs) deliver quality but add 800ms+ total latency at significant per-interaction cost. On-device keyword spotters are fast but cannot handle open-ended customer queries.

What LFM Unlocks

A unified audio model that handles both speech-to-text and text-to-speech, paired with a specialist 350M intent classifier. Two models replace four services. Faster, cheaper, and simpler to operate.

This demo is fine-tuned on sample data. Results improve with your data.