The lightest weight, most accurate speech recognition models that are truly streaming — producing text just 120ms from the start of audio input, while competing systems measure latency from the end of a spoken utterance that may take several seconds.
120ms
Text from start of audio — not end
10M
Smallest model params
2×
Smaller than transformers
What is ABR Streaming Speech-to-Text?
ABR Streaming Speech-to-Text is speech recognition technology developed for edge device applications that converts voice audio to text — accurate enough for human understanding, fast enough for the lowest-latency prompting to LLMs.
Unlike block-based systems that wait for a complete utterance before processing, ABR’s SSM architecture begins decoding the moment audio arrives — delivering streaming text in just 120ms from the start of audio input. Most competing ASR systems report latency from the end of an audio block, which can be several seconds after the user begins speaking. That gap is where ABR wins.
The problem
These are the pain points ABR was built to solve — consistent themes across every customer conversation.
Cloud ASR requires a persistent, low-latency connection. Any dropout causes failures — unacceptable for products used in the field, in vehicles, or in facilities with unreliable connectivity.
Cloud round-trips add hundreds of milliseconds of unavoidable delay. Embedded block-based ASR waits for the utterance to end before processing. Neither feels natural. Users notice immediately.
Lightweight embedded ASR models have historically had word error rates too high for production use. Customers need accuracy that matches cloud quality — on hardware that fits in a wearable or MCU.
Always-available voice on battery-powered devices demands ultra-low power consumption. Transformer-based models draw far too much power to be practical on edge hardware with real battery constraints.
Sending audio to the cloud means user voice data leaving the device — a dealbreaker for medical, enterprise, and consumer products where privacy is either regulated or a core brand promise.
Why ABR Speech-to-Text
ABR's state space models deliver groundbreaking efficiency without sacrificing best-in-class accuracy — designed from first principles for the edge.
Audio never leaves the hardware. No cloud round-trip, no network dependency, no degradation if the network drops or bandwidth is low. Ideal for products used in the field, in vehicles, or in facilities with unreliable connectivity.
Tokens are decoded within 120ms of audio input, not after the user stops speaking. ABR's SSM architecture eliminates the block-processing bottleneck that limits other embedded ASR systems.
Equal or better WER than transformer models at half the size. ABR's SSM technology scales across model sizes without ever sacrificing real-time performance.
ABR's SSM models run on MCUs that cannot support transformer-based systems — half the memory footprint at equal or better word error rates, across the full range of model sizes.
All processing happens locally. No audio data leaving the device — ever. No expensive cloud voice AI API costs. Perfect for medical, industrial, and consumer products where privacy is non-negotiable.
Production-ready SDK with multiple languages supported runs on a broad set of leading edge CPU/NPU platforms. One SDK, multiple silicon options with consistent voice quality across all platforms.
Competitive Landscape
ABR ASR is the only embedded solution that simultaneously outperforms cloud vendors on latency and on-device efficiency, while delivering best-in-class accuracy and real-time operation.
Cloud providers measure latency from the end of an audio utterance, then add network round-trip time — often 500ms to several seconds total. ABR delivers text 120ms from the start of audio, entirely on-device. No network dependency. No expensive cloud voice AI API costs. No audio leaving the hardware.
Existing embedded alternatives either lack true real-time streaming, sacrifice accuracy to fit on constrained hardware, or require more compute than edge MCUs can provide. ABR's state space models achieve best-in-class word error rates at half or less than the parameter count of comparable transformer models — verified on the open benchmark leaderboard.
Model Tiers
A variety of models tuned for different hardware budgets and accuracy requirements. All truly streaming. All available now.
Runs on TSP1 at <30mW
Flexible licensing to meet product needs. Contact us for pricing aligned to your production scale and hardware target.
Get started
ABR is a leader in AI chip innovation, with patented technologies and groundbreaking research that set us apart from the competition.