State Space Speech · Text-to-Speech

Natural voice.
Runs on the edge.

The lightest weight speech synthesis models designed to run in real time on edge MCUs — starts speaking with ultra-low latency, no cloud, and natural voices that don’t sound robotic.

4m

Smallest model params

<30s​

Create custom voices from seconds of audio

<30mW

Power

What is ABR Streaming Text-to-Speech?

Voice synthesis that starts speaking immediately — not after the last word

ABR Streaming Text-to-Speech is a voice generator developed for edge device applications that converts text prompts to natural human speech.

It synthesizes speech with ultra-low latency as LLMs compose their responses, enabling a truly conversational feel to device interactions. All models run on edge hardware completely out of reach for any other TTS solution — with real-time part-of-speech tagging that gives the voice correct inflection and rhythm, not the monotone cadence of legacy embedded TTS. Additionally, our TTS models support prosody and emotional control for even more expressive speech.

ABR also offers a custom voice pipeline — generate a bespoke TTS model tuned to a specific voice with less than 30 seconds of voice audio. Branded voices, personalised assistants, and accessible devices with familiar speech, all running on-device. 

Why ABR Text-to-Speech

The voice layer that fits where others don't.

Every other TTS solution either lives in the cloud or demands hardware your product doesn't have. ABR's SSM architecture was built from first principles to synthesize expressive speech on the smallest chips in production today.

Starts Speaking Immediately

ABR's SSM architecture begins generating audio with ultra-low latency as text arrives — no need to wait for complete input. Uniquely suited for vocalizing LLM responses as they stream out, creating a genuinely conversational feel.

Tiny Enough for Edge MCUs

All ABR TTS models run on edge hardware completely out of reach for any other TTS solution — fitting in constrained environments while maintaining real-time synthesis and natural voice quality.

Natural, Expressive Prosody

Real-time part-of-speech tagging gives the synthesized voice correct inflection, emphasis, and rhythm — not the monotone cadence of traditional edge TTS. Our SDK also supports inline SSML prosody tags for even more realistic and contextual expression.

No Cloud, No Subscription

All synthesis happens locally. No audio leaves the device. No expensive cloud voice AI API costs. No degradation if the network drops. Perfect for medical, industrial, and consumer products where privacy and reliability cannot be compromised.

Custom Voice from 30 Seconds of Audio

ABR's custom voice pipeline generates a bespoke TTS model from just a short audio sample. Build branded product voices, personalised assistants, or accessible devices with a familiar voice — all running on-device.

Multiple Languages, Cross-Platform

Production-ready SDK with multiple languages supported runs on a broad set of leading edge CPU/NPU platforms. One SDK, multiple silicon options with consistent voice quality across all platforms.

Competitive Landscape

The best-performing embedded TTS — with cloud-beating latency.

ABR TTS is the only embedded solution that outperforms cloud vendors on latency and on-device efficiency, while delivering natural voice quality and high-quality, expressive on-device voice.

vs. Cloud TTS

Cloud TTS vendors produce high-quality audio but require a persistent connection and expensive voice AI token costs, plus 200–800ms of network latency on top of synthesis time. ABR synthesizes speech entirely on-device — no network round-trip, no token costs.

vs. Legacy embedded TTS

Traditional on-device TTS solutions are recognisably robotic — they lack natural prosody, correct emphasis, and the rhythm of human speech. ABR's real-time part-of-speech tagging gives synthesized voice correct inflection and cadence, making it sound natural rather than machine-generated. We then enhance this with prosody and emotional expression controls to enable even more expressive speech that is tuned for your application. ABR's models also run on hardware that legacy embedded TTS cannot match for model size and power efficiency.

Model Tiers

Two models, every target

Both models are available now for immediate evaluation — TTS 4M for the most constrained hardware, TTS 6M for higher fidelity on platforms with more headroom.

nith-3m-live

  • Platform: A broad set of leading edge CPU/NPU platforms, TSP1
  • Voice fidelity: Basic
  • Streaming: Yes
  • Prosody: Real-time part-of-speech tagging
  • Languages: Multiple

Runs on TSP1 at <30mW

nith-5m-live

  • Platform: A broad set of leading edge CPU/NPU platforms
  • Voice fidelity: High
  • Streaming: Yes
  • Prosody: Real-time POS tagging and SSML
  • SSML: Supported — prosody, pitch, volume, emotion
  • Languages: Multiple
Contact us for your specific language needs

Flexible licensing to meet product needs. Contact us for pricing aligned to your production scale and hardware target.

Why “Nith”? ABR names its model families after rivers. State space models process sequential data as a continuous flow, much like rivers — always moving forward, maintaining state efficiently over time. Nith is ABR’s TTS model family, named for a river that is a major contributor to the watershed where we are headquartered in Waterloo, Ontario, Canada.

Get started

Hear ABR TTS running on your target hardware

We’ll demonstrate live synthesis on your platform, walk through model quality vs. size tradeoffs, and help you get customized production voices embedded into your products.