Products

SpeechFoundry Voice Recognition

SpeechFoundry™ is the brand name our commercial voice software. All you’ll ever need for processing voice is here: It’s a true end-to-end solution. With superior functionality, unmatched accuracy, and high performance voice recognition, SpeechFoundry works for any task and any environment:

State-of-the art User experience. SpeechFoundry offers high accuracy, natural-language understanding, dialog modeling, wake-up word, barge-in support and many more features that makes your product fun to talk to. See here a detailed list of SpeechFoundry features.
From super-small to super-fast, SpeechFoundry runs in many environments: We have deployed solutions for Raspberry Pi Zero, 2B, 3B, iOS and Android embedded platforms, and client SDKs for Linux, iOS, Android and Flash connected architectures.
You don’t need hybrid – SpeechFoundry runs the same voice engine in both embedded and connected environments. In fact, it is the only commercial voice solution worldwide with this distinctive advantage. See here how this makes your efforts faster, more flexible, more scalable and cheaper!
Choose the SpeechFoundry modular components to build the voice applications that you need: Voice control to control functions of your device or app by voice, voice search/ voice assistant that provides information on your database or online content to a spoken request, and voice transcription to convert your audio and video files into easily searchable text.
SpeechFoundry now supports nine major world languages, among them English, Japanese and Mandarin Chinese. We also offer a transparent and reasonable development plan to add the languages that you need.
Every voice product needs to be customized to fit your requirements, environments and limitations. Create solutions yourself in our browser-based development environment , or have our experienced professional service team customize a solution for you – whatever you chose, we promise we’ll help you build something that works.
Add additional components to the voice experience like: Text-to-Speech (TTS), Dialog Management, Context Management or Named Entity Recognition (NER) for huge lists of proper nouns.

User Experience

High accuracy in noisy environmentsSpeechFoundry achieves high accuracy operation even in noisy environments such as driving cars or restaurants for low SNR levels.
Natural language understanding (NLU)People can talk as they would talk to human counterparts – no need to restrict your users to unnatural, artificially short “machine” commands.
Large vocabulary supportUse huge language models with 1,000,000 distinct words and more in both embedded and connected environments.
Wake-up wordAlways listening mode allows the user to activate their system with a keyword, like “Hello Hal 9000.” Removes the need for a “press to talk” button.
Barge-inUsers can interrupt lengthy machine dialog prompts by starting to speak at any time.
Fuzzy and partial matchIdentify long proper nouns by just saying parts of name. This works nicely i.e. for full names in phonebooks, or for song titles or long shop/restaurant names.
FootprintOptimize the user experience in terms of accuracy and speed/latency by customized solutions for limited architectures.
Dialog and ContextModel human communication by adding multi-step conversations and retaining contextual knowledge about previous conversations.
Speaker adaptationOur models get used to the distinct sound and pronunciations of the main user over time, resulting in even higher accuracy.
Speaker identificationVoices are as unique as fingerprints – use speaker ID to distinguish specific users of your system.

Voice Control

Imagine having a butler that runs to all your devices in your house and presses buttons and touches screens for you. All you need to do is to tell your voice butler what you want him to do. Say “lights on” or “it’s too hot in here” to switch on the lights in the living room or turn the air-conditioning a notch up. Voice control requires a tight grasp on understanding the variety of expressions that humans use to get things done in a certain situation (Natural Language Understanding); it also requires customizing the solution for specific acoustic environments like houses or cars. Let our experienced professionals help you customize your voice butler.

Voice Assistant/Voice Search

Voice search means essentially searching for information by voice that you would otherwise need to type or input letter by letter, e.g. for the current weather in New York or the location of the closest Starbucks coffeehouse when in Shinjuku, Tokyo. Voice search deals with recognition of thousands or much more proper nouns in large databases, and focuses on the accurate distinction of the elements of those databases.

Voice Transcription

High-accuracy transcription of huge amounts of voice data with fast-turnaround and optional manual verification and editing. Simply upload your audio or video files and wait to get an email with the transcription. Log in to search the transcription and jump to relevant parts of the audio.

SpeechFoundry Voice Recognition

Acoustic Quality Monitoring Tool

SpeechFoundry Voice Recognition

User Experience

Voice Control

Voice Assistant/Voice Search

Voice Transcription

Acoustic Quality Monitoring Tool