Speech Automation


Multi-Factor Authentication


Request a Demo
Sun – Closed

Contact Us
24/7 online feedback

Using Speech Recognition to Improve CX and Reduce Costs

Whether through their PC, Smartphone or Digital Assistant devices customers today have embraced the use of speech recognition when seeking simple, quick and straightforward ways to accomplish a task or access information. They expect no less when reaching out to your contact center.

Speech Recognition technologies, including speech recognition, text-to-speech, multifactor authentication (voice biometrics + questions) when used in conjunction with your Genesys Customer Experience Platform assets like Genesys Voice Platform (GVP in PureEngage) enable you to speed your customers to successfully completing their desired transaction or getting them to the right agent quickly and easily. You can save agent handling time, improve CX (customer experience) and all for lower costs when compared to purely agent handled calls.

How can I use these technologies in my contact center?

Speech recognition can greatly improve CX and your callers’ ability to self-serve in a voice channel by enabling more complex and personalized interactions. It can be used to identify or authenticate the caller using voice biometrics and multifactor authentication – saving agent handling time and frustration (what agent enjoys asking authentication questions?) by conducting the authentication in self-service before transferring the caller to an agent or to an appropriate self-service application. By combining voice biometrics with existing questions, you gain significant improvements in fraud prevention and avoid social engineering.

Speech recognition can be used to help determine the reason for a call, along with other data you have on hand about the customer so that you can quickly transfer the caller to the right agent with the right skills and availability. Faster and more accurate transfers improve customer satisfaction (think NPS scores) and increase employee satisfaction while reducing costs by reducing agent handling time. Automated interactions typically cost 25% or less when compared to agent handled interactions on a per-minute basis. If agent handled interactions cost between $1 and $3 dollars per minute a five-minute call costs between $5 and $15. Typical automated call handling costs between .25 & .50 per minute or between $1.25 and $2.15 per call. By shifting even 15-20% of your call handling time from agents to automation, you can reduce headcount requirements and costs significantly.

After business hours or during peak busy periods when agent availability is an issue, your callers call completely self-serve for many common interactions using speech recognition and text-to-speech so that they can request and receive information specific to them. Generally, any transaction in your contact that agents handle today by asking for some specific information from the call, typing that into a computer screen and then reading the answer back to the call can easily be automated with speech recognition. Things like what’s my account balance, when was my order shipped, can I get the address of the closet store can be automated. This not only keeps your customers out of the hold queue, but it can also help reduce your overall agent headcount demand and provide your customers with access to information they need even if your contact center is closed.

Our speech recognition technology can also be used in outbound calling when customers to confirm orders, provide shipping information or provide other important timely information. LumenVox Call Progress analysis can improve the message delivery success of your outbound calling applications by helping you determine whether the call has been machine answered or live answered and then delivering the correct payload successfully for machine answers or engaging with live callers using speech recognition and text-to-speech.

Many of our customers find that the same technologies can be used to reduce costs and improve CX for their employees also. Applications like Automated Password Reset can empower your customers and employees to reset their passwords without the involvement of a help desk agent. Save them from having to remember security questions or PINs that are easily lost or shared. Reliably authenticate your users anywhere and anytime with a simple spoken passphrase, a selfie or security questions.

LumenVox’ brings 17 years of experience in speech recognition automation to your contact center. We provide the core ASR, TTS and Voice Biometric technologies to speech enable your customer interactions. As an AppFoundry Partner with Genesys we provide a variety of speech recognition technologies and applications such as Password Reset, all fully tested and certified to speech enable your Genesys environment.

Sound Interesting?

We have a quick “5 Simple Questions” process to help you identify specific use cases in your contact center. If you’re curious about how speech recognition can be used in your contact center reach out to LumenVox and we’ll work with your Genesys representative to ask those questions and show you how you can take advantage of speech related technologies to improve your customer experience, make your agents happier and reduce costs.

Improved Intelligence of the Interaction

Guest Post by Brian Pia, CEO, Think Tank Partners


In a recent post; The ROI of Speech we discussed ways in which the use of speech recognition technology has changed the face of how companies interact with their customers. Perhaps the most significant benefit realized through the implementation of a speech enabled service solution is the enhanced level of intelligence delivered during the customer interaction – the improved intelligence of the interaction. Customers are no longer bound to pushing keys to force fit their call reason into the company’s pre-determined options.

Since speech-enabled solutions provide a highly conversational interaction with customers, organizations are empowered to expand the level of intelligence their self-service solutions offer. Benefits from implementing such a solution come from two perspectives: 1) reduced costs and accelerated ROI, and 2) enhanced customer experience.
For the purposes of this post, we’ll focus on how the customer experience is enhanced by implementing a robust speech self-service solution. We’ll specifically address the questions posed in the ROI of Speech post:

  • Can I engage customers in manner that allows me to dynamically generate personalized treatment that results in higher rates of self-service or cross/up sell opportunities?
  • When customers don’t want to play in the IVR, can I gather enough information to avoid costly misroutes?
  • Can I take what I know about the customer and provide proactive information that might resolve their need before they move into the transactional path or transfer to an agent?

Understanding how each of these factors tie into the overall speech self-service strategy will help to position the organization for success and yield an intelligent experience that customers will engage in time and time again.

Dynamic Engagement

Given the dynamic nature of conversational speech, companies can leverage speech technology to build very robust interactions with consumers. Let’s assume a customer calls to inquire about their checking account. Based on this customer’s profile we know that they are a high-net worth customer and would be eligible for numerous up-sell offers. Using conversational dialogue, we can begin to ask the consumer targeted questions in conjunction with what we know about their relationship with the bank. This depth of conversation would be controlled by the consumer and all information collected would ultimately be used to improve the intelligence of the customer record. Over time, the organization would have a targeted view of this consumer using a combination of profile and behavioral information.

As organizations begin to consider whether a speech-enabled solution is right for them they should take inventory of their current personalization strategies and lay out numerous use cases that could be supported through a more robust speech solution.

Intelligent Routing

The most successful companies across all industry verticals recognize that holding consumers hostage within the IVR system is the most egregious error they can make. The internet is filled with horror stories of consumers being trapped in automation purgatory. In fact, being trapped in the IVR is one of most common reasons consumers hate to use automation. To combat this, companies tried, often unsuccessfully, to build “second chance” menus to capture caller intent to get them to the right location. Of course, consumers who despise automation rarely play at this level.

Fortunately, speech recognition technology provides a viable solution for both the consumer and the organization.

For companies, the conversational approach of the speech solution provides a sense of forward progress to the consumer. This approach promotes engagement and therefore reduces the rate of costly internal transfers, as well as improves the perception of the company.

Consumers benefit from easy transfers without the need for sitting through verbose second chance menus or cycles of repeated commands.

Proactive Information

While proactive information can be successfully pushed in a DTMF solution, the use of speech recognition technology can expand the interaction, thereby delivering a much more targeted push. The depth of engagement will be far deeper with speech technology. Consumers can provide complex responses and the company can offer multiple data points in a single question. The dynamic interaction will reduce the cognitive load on a consumer as the flow of the information will more fluid and natural. This approach is highly successful in keeping calls in the self-service channel and avoiding the costlier agent channel across many industry verticals, particularly when high rates of repeat callers are common, for example credit card and bank account balance inquiries.

The power of speech technology continues to change the face of the self-service world and customer experience as a whole for improved intelligence of the interaction. Understanding the use cases for the technology requires a solid understanding of current capabilities and consumer behavior and expectations. While the questions presented above represent a significant portion of developing a business case for a speech technology solution, numerous other factors must be addressed to build a comprehensive roadmap.

In our next installment, we will discuss how speech can open opportunities for new functionality and scope of coverage across the entire self-service solution.

The Mechanics of Tuning: How to Get the Most Mileage from a Speech Application

Guest Post by Maria Simonton, Director of Product Marketing, Interactive Northwest Inc. (INI). As mentioned in our previous blog, tuning analysis greatly enhances the performance of speech-enabled applications. LumenVox has invited Interactive Northwest Inc. (INI), a LumenVox Skills Certified Partner, to share their insights on the process of speech tuning and the improvements associated with ongoing tuning over the life of the solution.


I bought a fancy new car about five years ago; all the bells and whistles, every luxury upgrade that money could buy. At 3,000 miles, I got an oil and filter change and put air in the tires. I haven’t taken it to the mechanic since. It’s still running—maybe not as well as it used to—but as long as it gets me from point A to point B, why should I do anything else?
Okay, that’s not a true story. Yet I see this happen all the time with the Cadillac of self-service investments: speech recognition applications. Despite the initial time, effort, and dollars spent, companies often take a set-it-and-forget-it mentality with IVR. Tuning a speech application shouldn’t only happen after the pilot phase; it should occur at regular intervals to ensure the application is running just as smoothly as that vehicle in the driveway.

As we learned in the last Blog post, there are many benefits to tuning, so I won’t reiterate them here. Rather, let’s talk about the process, how it works, and the type of improvements you can expect to see.

The first step is identifying when to initiate a tuning cycle. This can be at pre-defined intervals (annually, for example), following an application enhancement, or even timed in conjunction with an outside event that affects usage and traffic flow to your application. Perhaps you’re an insurance provider, and the government has just mandated coverage and benefit changes. Policyholders may call your application with questions and directives that are new and unexpected. Tuning helps reveal what callers are asking for, and how those needs change over time.
Once you’ve decided to engage in tuning, it’s time to enable utterance capture on your speech recognition server. I usually recommend a minimum two-week period for this, but the interval could be longer or shorter based on call volumes. Before utterance capture is enabled, be sure to play a “your call may be monitored or recorded” message up front to keep the legal department happy, and always double-check that utterance capture is working by listening to a WAV file or two. It’s not unusual for there to be permissions issues writing files to a directory on the server.

While utterance capture is underway, you’ll want to make a list of the dialogs to tune. Prioritize those that get the most use, have high error-out rates, or represent critical “gates” in the call flow. If, for example, failure at a certain prompt prevents a caller from going down an important path (such as making a payment), that “gate” should be tuned for optimal performance. Call event reports will come in handy for identifying any problem areas in the application. In general, I suggest minimizing the number of yes/no or digit dialogs that receive tuning attention because they typically use shared grammars, and any findings at one prompt can be applied to the others.

Armed with utterances, logs, and a tuning plan, it’s now time to load up that data into the LumenVox Speech Tuner. Luckily, the tool provides a very user-friendly interface for transcription and analysis. But once you’re listening to caller recordings, what are you actually looking for? Well, tuning is a fairly subjective process that requires a skilled ear and critical thinking, and it’s sometimes difficult to distinguish trends from outliers. That said, I’m typically on the lookout for: (a) out-of-grammar utterances of a significant sample size, (b) red herring “sound-a-likes” that confuse the speech engine, (c) prompts that mislead the caller into giving unexpected responses, (d) rejection of valid utterances due to confidence scores, and (e) “talk-off” issues where only partial utterances are captured. Addressing such problems may require grammar updates, new phrase recordings, configuration changes, or a combination of all three. A trained voice user interface expert can assist in making and implementing the tuning recommendations based on the data revealed by the Speech Tuner tool.

So, with analysis complete and the application changes in place, what type of measurable improvements can you expect to see? The answer here is always: it varies. You may experience self-service task completion rates that jump from 50% to 75%, but you may not. The fruits of a tuning effort are typically weighed over time and over multiple iterations. False-Accepts (the phrases that the recognizer accepted, but shouldn’t have) and False-Rejects (the phrases that it rejected, but should have accepted) should decrease. Confidence scores for Correct-Accepts should increase. Follow-up tuning cycles will expose these trends, but it’s almost never possible to assign your expectations a hard-and-fast number. Continuing to analyze call event reports will help illuminate where the gains have been made and where to focus your efforts in the next tuning cycle.

Performing these steps in regular intervals will ensure that you’re getting the maximum mileage out of your investment, and protects against a user interface breakdown that could have been avoided with routine maintenance. Not to mention the fact that customer satisfaction will improve when the user experience does!

Please contact your account manager or LVSales@LumenVox.com for more information on the LumenVox Speech Tuner. For more information on the speech tuning process, or to engage Interactive Northwest (INI) in an application tuning cycle, visit the Contact INI page to speak with a qualified speech mechanic. Vroom vroom!

Interactive Northwest, Inc. (INI) develops innovative interactive voice response (IVR), computer telephony integration (CTI), and self-service applications for high-volume contact centers in markets such as government, healthcare, finance, utilities and service industries. A strong commitment to platform expertise, seamless systems integration, and project management excellence uniquely position INI to provide value to its customers. As a long-standing partner in the Avaya DevConnect program and developer of call center speech applications, INI has a deep history in deploying applications on Avaya platforms — making it a reliable partner capable of delivering results that promote the success and profitability of its customers. www.interactivenw.com

The ROI of Speech

Guest Post by Brian Pia, CEO, Think Tank Partners. With all of the speech technology options available today, companies can be overwhelmed by the difficult decisions to be made. Increasingly, end users are looking for help from consultative advisors focused on the strategic aspects of Customer Experience that include the speech-enabled user interface. As we get ready to kick-off our participation at SpeechTEK, LumenVox has invited Think Tank Partners, an independent consultant, to share their valuable insights on how the implementation of speech recognition technology can financially and qualitatively benefit companies.



Speech recognition technology has improved leaps and bounds over the past decade. Today, we use speech technology in all aspects of our lives. Given the high rate of adoption across all markets and the introduction of new devices such a speech-enabled virtual assistants, companies who were once skeptical of using the technology have found themselves re-examining its possibilities. While this is an exciting time in the speech industry, companies considering speech recognition, whether directed dialog or natural language, continue to be challenged in justifying the investment. Executive leadership is often asked to prove the value of the investment through the impact of reduced costs, improved operations or enhanced customer experience. Most often, that value is defined by the rate of payback of the expenditure; the return on investment (ROI).

In theory developing an ROI analysis is quite easy; simply put it’s the cost of the solution divided by the expected monthly benefit of the solution in dollars.

But what does that mean for a speech implementation?

Why is it so hard to identify an accurate payback period when converting to a speech-enabled solution?

The answers lie in the inputs that determine the expected benefits. For most executives, the focus is quite narrow. The mindset generally leans towards deriving the benefit from the replacement of a DTMF IVR to a speech-enabled solution. Projections around increases in the number of identified or authenticated callers and transactional containment tend to drive the analysis. This approach is somewhat valid, but it is highly limiting and, when used in isolation, often yields disappointing results. In the end, the ROI analysis tends to fall short in capturing the full impact, leaving executives and investment committees wondering if a speech solution is a wise investment given what they determine to be the long-range benefit.

To overcome these limitations and build a compelling ROI, traditional thinking must be thrown out the window. Instead of being focused on replacement, it is vital to expand the understanding how a speech-enabled self-service solution will elevate the overall customer experience. With speech, doors open and the dynamic of the customer interaction changes forever. Whomever is responsible for conducting the ROI analysis must understand the complete customer experience, how all contact center technologies and processes will be impacted, and have insight into the overall company vision from a branding and marketing perspective. In particular, executives conducting an ROI analysis must consider the following to truly capture the numerous benefits that speech recognition provides.

Expanded Scope of the Solution

DTMF systems are inherently limited. These systems are bound by a menu hierarchy that, by design, has limitations imposed to effectively guide the caller to their ultimate destination. Since these design limitations significantly impact the reach of the self-service solution, the financial benefits derived will also be constrained. When moving to a speech-enabled solution, those design limitations disappear, providing greater flexibility in expanding the solution’s footprint.

By expanding the scope of the solution footprint, the base of customers you are able to service will expand. Imagine moving from an environment where each of your lines of business has its own entry point or a dedicated IVR to a solution that provides a single point of entry with a consolidated customer experience for all your callers. The benefits of such an approach are easily identified in terms of higher containment, reduced costs of solution maintenance and expanded capabilities. Those benefits play a powerful role in building a strong business case for a speech-enabled customer experience solution.

Expanded Functionality

Speech-enabled solutions change the face of the customer experience. In addition to expanding the overall service footprint, speech recognition enables companies to serve their customers in ways that were never possible in DTMF systems. As organizations consider speech technology investments, it is important to step back and take inventory of their self-service application portfolio. This assessment must consider current functionality as well as new functionality that can be transacted due to the improved capabilities of the speech. By thinking of those customer inputs that were once too complex for a customer to enter using DTMF entry but can be accomplished readily with speech recognition allowing your organization to provide greater service options to customers and thereby experience higher rates of costs savings than the former DTMF system allowed. Additionally, a new speech solution can provide significant opportunities to partially automate transactions, such as claims initiation, shaving off several minutes off of the call once the customer is transferred to the Customer Service Representative.

Improved Intelligence of the Interaction

Since speech-enabled solutions provide a highly conversational interaction with customers, organizations are empowered to expand the level of intelligence their self-service solutions offer. When building the ROI analysis executives should consider how the new speech solution will change the dynamics of the conversation with the customer. Questions to consider are:

  • Can I engage customers in manner that allows me to dynamically generate personalized treatment that results in higher rates of self-service or cross/up sell opportunities?
  • When customers don’t want to play in the IVR, can I gather enough information to avoid costly misroutes?
  • Can I take what I know about the customer and provide proactive information that might resolve their need before they move into the transactional path or transfer to an agent?

As your organization starts to build a business case to justify an investment in speech technology, looking at the areas discussed above will help to enhance the value of the solution. By broadening the reach of the new solution beyond the replacement mentality you will quickly start to see that speech technology is not only vital to the overall customer experience, but provides a solid return on investment, even for those organizations who think their annual call volume is too low to justify a move to speech recognition.

Think Tank Partners, is a leader in developing customer experience transformation strategies and designs, focused on establishing world-class, speech-enabled, conversational interactions across all consumer touchpoints. Think Tank Partners combines deep expertise in human factors and human-computer interaction with experience in business intelligence and analytics, providing strategic consulting that integrates corporate branding, consumer segmentation and business and market strategy to align business and technology roadmaps with end users’ broader strategic vision. www.thinktank-partners.com

LumenVox Speech in the Cloud

Increasingly, these days, applications are finding their way from traditional premise-based installations into a variety of hosted, cloud-based and even hybrid architectures. This is occurring for a number of reasons – one is to help users eliminate the single points of failure in systems and data centers, so that if one machine or data center fails, total functionality is not lost. Another is ability to more cost effectively load balance and consolidate system resources and others choose it because it eliminates the need to manage a stack of systems on premise and some choose the cloud based systems to scale quickly and easily.

The same thinking is being applied within the Speech industry, where more and more applications are migrating to a cloud-based infrastructure.

Since our beginnings in 2001, LumenVox has designed our products to have a completely modular and distributed architecture, allowing the various speech resources to be installed on a number of different machines. This enables users to seamlessly migrate their applications to a cloud-based environment with minimal changes to configuration. In addition, virtualized environments, including Virtual Machines (VMs) are now commonly being used in many applications, allowing users to create new instances of servers based on failure conditions, or in response to increased usage. The larger, more demanding applications we see deployed today often rely on this type of robust architecture to supply their production requirements. LumenVox software is ideally suited for these types of environments and has supported them for some time.

At the beginning of 2012, we embarked on the significant effort to migrate our legacy product licensing to a new mechanism Flexible Licensing System (Flex) that enables modern cloud-based and virtualized configurations. Earlier this year, LumenVox successfully completed migration of all of our subscription-based customers over to the new Flex mechanism and was widely heralded by our customers as fast, easy and flexible.

The Flexible Licensing mechanism customers can oversubscribe to their licensing and peak above what they have purchased. Just like the benefit of cloud computing which grows and contracts on demand, one can easily scale up additional LumenVox ports instantly when necessary and only when needed. Ultimately we believe that it is important to allow customers to mix and match a variety of licensing schemes to suit their needs at that moment.

The Importance of Brazilian Portuguese

As our relationship with our partners mature, we find ourselves influenced to create and develop for their needs. We do not build something in hopes that our partners will buy it. We listen carefully to those around us and develop to their needs after the proper qualification is secured. Recently there has been a flurry of activity in Central and South America and especially in Brazil. One of our larger partners who have fully integrated our software into their platform, requested we develop a Brazilian Portuguese acoustic model. Given our willingness to please and our understanding of the market potential of undertaking such a venture, we recently agreed and added this to our ever growing list of languages. We used a novel approach to developing this model that we have been researching for quite some time and we think should satisfy most speech recognition applications.

The Brazilian economy is in a state that presents a favorable opportunity to increase automation to maximize their efficiencies. To answer this urgent call, LumenVox has expanded its ASR offering by creating a Brazilian Portuguese language model which will bring the number of ASR languages to 8 and the number of TTS languages to 23. Today LumenVox covers all of the America’s from Cape Horn to Cape Columbia and everywhere in between!

We will be doing a lot more with the Asian TTS languages in the future, once we figure out how to deal with some of the double byte issues in our Media Server. We just entered into QA with our new version so we should be able to share some details with you on this in just a few weeks.

Let's get to know each other,

please complete the form below to download your requested asset. 

Download Now

Let's get to know each other,

please complete the form below to download your requested asset. 

Download Now

Let's get to know each other,

please complete the form below to download your requested asset.

You have Successfully Subscribed!

Let's get to know each other,

please complete the form below to download your requested asset.

You have Successfully Subscribed!