The lightest weight speech synthesis models designed to run in real time on edge MCUs — starts speaking with ultra-low latency, no cloud, and natural voices that don’t sound robotic.
4m
Smallest model params
<30s
Create custom voices from seconds of audio
<30mW
Power
What is ABR Streaming Text-to-Speech?
ABR Streaming Text-to-Speech is a voice generator developed for edge device applications that converts text prompts to natural human speech.
It synthesizes speech with ultra-low latency as LLMs compose their responses, enabling a truly conversational feel to device interactions. All models run on edge hardware completely out of reach for any other TTS solution — with real-time part-of-speech tagging that gives the voice correct inflection and rhythm, not the monotone cadence of legacy embedded TTS. Additionally, our TTS models support prosody and emotional control for even more expressive speech.
ABR also offers a custom voice pipeline — generate a bespoke TTS model tuned to a specific voice with less than 30 seconds of voice audio. Branded voices, personalised assistants, and accessible devices with familiar speech, all running on-device.
Why ABR Text-to-Speech
Every other TTS solution either lives in the cloud or demands hardware your product doesn't have. ABR's SSM architecture was built from first principles to synthesize expressive speech on the smallest chips in production today.
ABR's SSM architecture begins generating audio with ultra-low latency as text arrives — no need to wait for complete input. Uniquely suited for vocalizing LLM responses as they stream out, creating a genuinely conversational feel.
All ABR TTS models run on edge hardware completely out of reach for any other TTS solution — fitting in constrained environments while maintaining real-time synthesis and natural voice quality.
Real-time part-of-speech tagging gives the synthesized voice correct inflection, emphasis, and rhythm — not the monotone cadence of traditional edge TTS. Our SDK also supports inline SSML prosody tags for even more realistic and contextual expression.
All synthesis happens locally. No audio leaves the device. No expensive cloud voice AI API costs. No degradation if the network drops. Perfect for medical, industrial, and consumer products where privacy and reliability cannot be compromised.
ABR's custom voice pipeline generates a bespoke TTS model from just a short audio sample. Build branded product voices, personalised assistants, or accessible devices with a familiar voice — all running on-device.
Production-ready SDK with multiple languages supported runs on a broad set of leading edge CPU/NPU platforms. One SDK, multiple silicon options with consistent voice quality across all platforms.
Competitive Landscape
ABR TTS is the only embedded solution that outperforms cloud vendors on latency and on-device efficiency, while delivering natural voice quality and high-quality, expressive on-device voice.
Cloud TTS vendors produce high-quality audio but require a persistent connection and expensive voice AI token costs, plus 200–800ms of network latency on top of synthesis time. ABR synthesizes speech entirely on-device — no network round-trip, no token costs.
Traditional on-device TTS solutions are recognisably robotic — they lack natural prosody, correct emphasis, and the rhythm of human speech. ABR's real-time part-of-speech tagging gives synthesized voice correct inflection and cadence, making it sound natural rather than machine-generated. We then enhance this with prosody and emotional expression controls to enable even more expressive speech that is tuned for your application. ABR's models also run on hardware that legacy embedded TTS cannot match for model size and power efficiency.
Model Tiers
Both models are available now for immediate evaluation — TTS 4M for the most constrained hardware, TTS 6M for higher fidelity on platforms with more headroom.
Runs on TSP1 at <30mW
Flexible licensing to meet product needs. Contact us for pricing aligned to your production scale and hardware target.
Why “Nith”? ABR names its model families after rivers. State space models process sequential data as a continuous flow, much like rivers — always moving forward, maintaining state efficiently over time. Nith is ABR’s TTS model family, named for a river that is a major contributor to the watershed where we are headquartered in Waterloo, Ontario, Canada.
Get started
We’ll demonstrate live synthesis on your platform, walk through model quality vs. size tradeoffs, and help you get customized production voices embedded into your products.