As an organization that interacts with customers through speech applications, the quality of your speech recognition technology can make or break your CX.
In an ideal world, communicating with technology via speech would be as easy and natural as conversing with a human. This would make it so simple to access information and services remotely. It would also offer more independence to those who have no other option but voice user interfaces, such as young children who aren’t literate yet and people living with visual, motor or mobility impairments.
While some speech recognition technologies have made great strides in achieving these ideals, others are still falling far below expectations. This raises the question, why do some speech recognition technologies work well, while others fail?
The reality is: human speech is complex and constantly changing.
The challenges faced by modern speech recognition tools
An Automatic Speech Recognition (ASR) engine’s job is to take speech and identify it as something meaningful. Some ASRs have transcription capabilities, which allow them to turn that meaning into something useful, like text.
Getting this right is actually an incredibly challenging process. Firstly, ASRs must keep pace with the fact that language is constantly changing. In 2021, for instance, Merriam-Webster added 520 new words and definitions to its American English dictionary.
Also, ASRs must be able to separate speech from background and environmental noise. This could be the sound of traffic, a busy shopping mall, or even the interference that occurs due to the quality of the microphone used.
Unfortunately, many ASRs are simply not capable of handling these variables efficiently.
How to solve these problems
All this considered, companies need to choose their ASR engines carefully when building or modernizing speech-enabled customer experiences.
There are many different types of ASR engines on the market. Ideally, you want one that:
Supports all dialects within a given language
Offers advanced artificial intelligence and machine learning capabilities for maximum accuracy
Is able to continually learn from real-world usage and expand the language model to serve a more diverse base of users
LumenVox ASR with Transcription: Next-generation speech recognition
Status-quo speech recognition engines don’t have the machine learning capabilities to manage all the differentials in natural human speech—certainly not with the accuracy users expect. This is where LumenVox’s new ASR engine changes the game.
The technology that sets the LumenVox ASR engine apart is its end-to-end Deep Neural Network (DNN) architecture and state-of-the-art natural language processing and understanding capabilities. This creates an ASR engine that serves a much more diverse base of users.
Whereas other ASR engines treat different dialects as separate languages, LumenVox’s new ASR Engine with Transcription supports multiple dialects with one language model. This considers many different pronunciations in a single language, as opposed to having to train according to each individual user. The end-to-end recognizer matches audio to the written word—regardless of accent or other factors that impact pronunciation.
Additionally, no matter where the call or audio is coming from, the LumenVox Speech Recognizer separates speech from background noise using Voice Activity Detection (VAD). This takes a range of qualities into consideration, including energy level (volume), frequency (pitch) and changes in duration, to accurately detect the actual speech.
All this means that your speech solution can cater for a more diverse user base, in a broader range of scenarios, with market-leading accuracy.
Improve your speech application success rate with tuning
To get maximum value from your speech applications, LumenVox also offers an advanced turning tool that does all the heavy lifting for you, making it far easier for you to manage tuning in-house (and avoid expensive professional service fees).
LumenVox’s Speech Tuner performs transcriptions, instant parameter and grammar-tuning, and version upgrade-testing of any speech application, in less time and with less effort. This way, you can continually enhance speech recognition accuracy and build competitive advantage.
While there is room for improvement in the speech recognition technology landscape, the demand for voice-enabled solutions continues to grow. A study by National Public Media found that 52% of voice-assistant users say they use voice tech several times a day or nearly every day, compared to 46% before the pandemic.
If your company gets speech recognition right, you will be in a strong position to capitalize on this market growth.
The automation provided by Speech Recognition can save your business significant time and resources, with a tangible impact on your profitability. It can also revolutionize your customer experience by enabling self-service, enhancing the value offered in your contact center, and augmenting the usability of your speech applications.
All these attributes drive revenue growth. But if yours is the kind of organization that views success as a process rather than an end state, why let the advantages end there? With Speech Tuning, you can eternally optimize your Speech Recognition capabilities.
Born out of the belief that no matter how good technology is, it can always get better, Speech Tuning is the process of continually improving applications, including Automatic Speech Recognition-based systems, after they have been deployed. While this may sound like a chore, rather view Speech Tuning as an excellent opportunity to ensure the efficacy of your applications, maintain your competitive edge and amplify your Speech Recognition ROI.
The reality is: everything around us is evolving at a continuous pace. This includes your customers, the world they live in, their language and, most importantly, their expectations. You simply can’t stand still and expect to remain relevant.
If you’re not familiar with the term, Speech Tuning is a technology-driven approach to refining the performance of your speech-enabled applications, based on data gathered from real-world use. The goal is to perpetually enhance recognition accuracy, with a direct impact on call completion rates, containment rates, user experience scores and other metrics that matter to your business.
Fast, Accurate and Powerful Tuning with the LumenVox Speech Tuner
This tool offers value on multiple levels. First of all, it is designed to drive a swift and seamless tuning process. This allows applications to be tuned in less time with less effort, which lowers the total cost of ownership (TCO) of your speech applications.
There are also benefits for your users. The LumenVox Speech Tuner allows you to find and improve issues that you might otherwise have overlooked. This improves CX and strengthens your brand credibility.
How Speech Tuning for Automatic Speech Recognition Works
Speech Tuning assesses how users interact with the system and its testing changes. The process takes time, but when it comes to speech, every millisecond counts. Even minute improvements in application performance produce an impactful ROI within a brief period of time.
The LumenVox Speech Tuner performs transcriptions, instant parameter and grammar-tuning, and version upgrade-testing of any speech application. This reduces your workload during post-deployment application revisions. It also allows you to bring tuning in-house and thus avoid costly professional service fees.
LumenVox Speech Tuner is up and running, maximizing ROI, with just 3 easy steps:
1. Data Import
First, you import call log data into the Speech Tuner database. All information stored by the call log is available in the Speech Tuner. The Call Indexer service automatically scans remote speech applications for fresh logged calls, ensuring key data is just a click away.
2. Speech Transcription
Then, transcribers type the text of the caller’s speech directly into the Transcriber. Once the audio is transcribed, the Speech Tuner compares audio transcripts with the Speech Engine results to determine accuracy, greatly reducing errors associated with manual evaluations. The transcripts are evaluated using the actual decode grammar, producing measurements such as word error rates (WER), in-and out-of-grammar rates and semantic error rates.
3. Immediate Testing
Selecting an interaction in the Call Log automatically loads the associated audio and grammar into the Tester. The grammar can be edited, Speech Engine parameters set, and individual recognition tests generated. The Speech Tuner natively supports industry standard SRGS grammars. Once a set of possible changes is identified, users can batch test audio to evaluate performance, using those changes.
Ready to Reduce the TCO of Your Speech Applications?
The LumenVox Speech Tuner accelerates ROI by decreasing the time spent in tuning cycles. The more efficient your tuning process is, the more you’ll be able to decrease the Total Cost of Ownership (TCO) of your speech applications. The numbers are significant, with LumenVox clients documenting hundreds of thousands of dollars in savings per year, all as a result of fast, accurate and powerful speech tuning.
Interested in migrating off of your legacy speech applications? Contact us!
LumenVox is excited to announce the release of LumenVox Version 17.0.200. In this release, we have:
Added support for a new short-utterance transcription (Natural Language) functionality to process audio with a maximum length of approximately 30 seconds.
Added a new Out of Service configuration option for the ASR (Automated Speech Recognizer) service, allowing system administrators to enter maintenance mode from the Dashboard, which permits currently pending requests to be completed, but any new requests will be rejected (to be potentially handled by other ASR servers in the cluster).
Added a new feature to the ASR load-balancing mechanism to actively route ASR requests based on the language specified.
Useful for situations where you do not want to be constrained by a specific grammar, or challenged by implementing a more complex and costly Statistical Language Model, the LumenVox Short-Utterance Transcription functionality utilizes a built-in, general Statistical Language Model that has been tuned for everyday use to provide a text representation of supplied audio.
Supporting LumenVox’ commitment to making speech applications more secure and easier to administer, additional enhancements were made to our diagnostic tools and dashboard, including more robust grammar handling within the LumenVox Speech Tuner.
For a comprehensive list of improvements and features released with LumenVox Version 17.0.200, please click here.
If you’d like to watch a previously recorded webinar about the release, including participant Q&A, please click here.
Guest Post by Maria Simonton, Director of Product Marketing, Interactive Northwest Inc. (INI). As mentioned in our previous blog, tuning analysis greatly enhances the performance of speech-enabled applications. LumenVox has invited Interactive Northwest Inc. (INI), a LumenVox Skills Certified Partner, to share their insights on the process of speech tuning and the improvements associated with ongoing tuning over the life of the solution.
I bought a fancy new car about five years ago; all the bells and whistles, every luxury upgrade that money could buy. At 3,000 miles, I got an oil and filter change and put air in the tires. I haven’t taken it to the mechanic since. It’s still running—maybe not as well as it used to—but as long as it gets me from point A to point B, why should I do anything else? Okay, that’s not a true story. Yet I see this happen all the time with the Cadillac of self-service investments: speech recognition applications. Despite the initial time, effort, and dollars spent, companies often take a set-it-and-forget-it mentality with IVR. Tuning a speech application shouldn’t only happen after the pilot phase; it should occur at regular intervals to ensure the application is running just as smoothly as that vehicle in the driveway.
As we learned in the last Blog post, there are many benefits to tuning, so I won’t reiterate them here. Rather, let’s talk about the process, how it works, and the type of improvements you can expect to see.
The first step is identifying when to initiate a tuning cycle. This can be at pre-defined intervals (annually, for example), following an application enhancement, or even timed in conjunction with an outside event that affects usage and traffic flow to your application. Perhaps you’re an insurance provider, and the government has just mandated coverage and benefit changes. Policyholders may call your application with questions and directives that are new and unexpected. Tuning helps reveal what callers are asking for, and how those needs change over time. Once you’ve decided to engage in tuning, it’s time to enable utterance capture on your speech recognition server. I usually recommend a minimum two-week period for this, but the interval could be longer or shorter based on call volumes. Before utterance capture is enabled, be sure to play a “your call may be monitored or recorded” message up front to keep the legal department happy, and always double-check that utterance capture is working by listening to a WAV file or two. It’s not unusual for there to be permissions issues writing files to a directory on the server.
While utterance capture is underway, you’ll want to make a list of the dialogs to tune. Prioritize those that get the most use, have high error-out rates, or represent critical “gates” in the call flow. If, for example, failure at a certain prompt prevents a caller from going down an important path (such as making a payment), that “gate” should be tuned for optimal performance. Call event reports will come in handy for identifying any problem areas in the application. In general, I suggest minimizing the number of yes/no or digit dialogs that receive tuning attention because they typically use shared grammars, and any findings at one prompt can be applied to the others.
Armed with utterances, logs, and a tuning plan, it’s now time to load up that data into the LumenVox Speech Tuner. Luckily, the tool provides a very user-friendly interface for transcription and analysis. But once you’re listening to caller recordings, what are you actually looking for? Well, tuning is a fairly subjective process that requires a skilled ear and critical thinking, and it’s sometimes difficult to distinguish trends from outliers. That said, I’m typically on the lookout for: (a) out-of-grammar utterances of a significant sample size, (b) red herring “sound-a-likes” that confuse the speech engine, (c) prompts that mislead the caller into giving unexpected responses, (d) rejection of valid utterances due to confidence scores, and (e) “talk-off” issues where only partial utterances are captured. Addressing such problems may require grammar updates, new phrase recordings, configuration changes, or a combination of all three. A trained voice user interface expert can assist in making and implementing the tuning recommendations based on the data revealed by the Speech Tuner tool.
So, with analysis complete and the application changes in place, what type of measurable improvements can you expect to see? The answer here is always: it varies. You may experience self-service task completion rates that jump from 50% to 75%, but you may not. The fruits of a tuning effort are typically weighed over time and over multiple iterations. False-Accepts (the phrases that the recognizer accepted, but shouldn’t have) and False-Rejects (the phrases that it rejected, but should have accepted) should decrease. Confidence scores for Correct-Accepts should increase. Follow-up tuning cycles will expose these trends, but it’s almost never possible to assign your expectations a hard-and-fast number. Continuing to analyze call event reports will help illuminate where the gains have been made and where to focus your efforts in the next tuning cycle.
Performing these steps in regular intervals will ensure that you’re getting the maximum mileage out of your investment, and protects against a user interface breakdown that could have been avoided with routine maintenance. Not to mention the fact that customer satisfaction will improve when the user experience does!
Please contact your account manager or LVSales@LumenVox.com for more information on the LumenVox Speech Tuner. For more information on the speech tuning process, or to engage Interactive Northwest (INI) in an application tuning cycle, visit the Contact INI page to speak with a qualified speech mechanic. Vroom vroom!
Interactive Northwest, Inc. (INI) develops innovative interactive voice response (IVR), computer telephony integration (CTI), and self-service applications for high-volume contact centers in markets such as government, healthcare, finance, utilities and service industries. A strong commitment to platform expertise, seamless systems integration, and project management excellence uniquely position INI to provide value to its customers. As a long-standing partner in the Avaya DevConnect program and developer of call center speech applications, INI has a deep history in deploying applications on Avaya platforms — making it a reliable partner capable of delivering results that promote the success and profitability of its customers. www.interactivenw.com
LumenVox version 12.2, scheduled for release on Tuesday, Sept. 2, has a large number of exciting new changes. In particular, the Tuner is getting a major series of improvements, and some cool new changes have been added throughout.
From almost top to bottom, we have looked at how we can improve the usefulness of the LumenVox Speech Tuner. One of the first things we realized is that many users have trouble figuring out what they need to tune the most.
Analyzing by Menu
Loading data into the Tuner can be overwhelming, so we added a new concept to the Tuner called a menu. A menu is designed to allow you to filter data so you can tune a specific menu in an IVR or speech application.
The way this works is the Tuner analyzes the grammar files that were in use for each speech interaction. A main menu in a banking application might use the following grammars:
And a “transfer funds” menu might use:
Because the Tuner knows which grammars are active for which speech interactions, it can make logical inferences about which interactions should be grouped together. That grouping is the menu system. A new dropdown allows you to select from the various menus the Tuner recognizes and just pick the one you’d like to focus on.
Tuner Wizards New to the 12.2 are Tuner Wizards, a series of automated tools that guide you through the process of identifying problems and focusing on the relevant data. You can fire up the new Tuning wizard, pick a menu (or all of the data), and choose from a list of options to focus on. That list includes:
Confidence Threshold Tuning
Decode Speed Tuning
Decode Failure Tuning
The Tuning Wizard will let you know whether your data exhibits any problems related to these issues and then will help you identify which interactions contribute to the particular type of issues you’re facing. It’s a great way to focus your time so that you only pay attention to the items most relevant to you.
Grammar Editor Changes The Grammar Editor is a long-standing feature in the Tuner, giving developers an easy way to build, edit, and test their grammars. Several new features enhance the capabilities even further:
Multiple grammar parses. Previously, the Grammar Editor could only parse a sentence against a single grammar at a time. A new option allows developers to parse any combination of loaded grammars, making it easier to test how combinations of grammars will affect grammar coverage.
Pronunciation Checker. A new module called the Pronunciation Checker shows where pronunciations for grammar items come from: are they in our built-in dictionary? A user-defined lexicon? Or are they being produced by our statistical pronunciation rules? Words which don’t have good pronunciation definitions often lead to errors in recognition, so this is a useful module for troubleshooting performance.
Random Sentence Generator. This module generates 10 random sentences at a time that are allowed by the grammar. Using it, you can check grammar coverage to make sure that the words and phrases you expect to be in grammar are, while simultaneously ensuring that phrases you don’t expect to be in grammar are not.
Speech tuning is often perceived as an add-on effort to deploying speech applications. Our years of experience has demonstrated that it is a vital part of the process and can contribute significantly to cost savings for any business.
Speech tuning is the process of improving speech applications after they have been deployed. Speech Tuning assesses how users interact with the system and its testing changes. Though the process can be time-consuming, even minute improvements in application performance produce an impactful Return On Investment (ROI) within a short amount of time.
The LumenVox Speech Tuner can be used to accelerate this ROI by decreasing the time spent in tuning cycles, which also decreases the Total Cost of Ownership (TCO) of a speech application.
The numbers are significant, with our clients documenting hundreds of thousands of dollars in savings per year, all as a result of speech tuning.
In an updated whitepaper we describe the ROI of Speech Tuning, showing exactly how to calculate the return on investment (ROI) of speech tuning.