Conversive Announces New Speech Solution

Through its work in Web chat, Conversive has learned that many questions can be ably handled with one answer.  Unlike, it says, with previous ‘natural conversation’ speech recognition technologies, its speech solution readily recognizes the wide range of ways in which customers ask a single question.      Conversive’s speech solution works well in both standalone applications and in inbound call streams to divert calls to self-service. One of its key appeals is its ability to resolve a customer’s issues while they are on hold.  Instead of listening to music or a solemn voice indicating how many minutes remain before a contact center agent can take the call, a customer hears a voice asking: “How may I help you?”  The Conversive Speech Recognition solution turns the customer’s stated response into text.  Then, the Conversive Automated Conversation Engine matches the text string to the most appropriate response.  The answer is provided to the customer with a recorded voice.  

Message Intros New On-Demand Intelligent Voice Search Service

Message Technologies has announced the availability of MTI Natural Chat. The new hosted on-demand service allows enterprises to add Intelligent Voice Search and natural language understanding to their customer service offerings.

MTI Natural Chat is the result of MTI’s partnership with a number of speech, Web and Artificial Intelligence-based technology companies. It is a completely hosted offering for Voice Search and natural language understanding. MTI Natural Chat allows any new or existing IVR application to incorporate a truly intelligent natural language option anywhere within the IVR call flow. This incorporation offers a number of benefits such as significant increase in call completions, reduced per-call costs, and higher caller satisfaction levels.

Nuance Unveils One-Shot Destination Entry Technology on Microsoft’s Auto 3.1 Platform

Nuance Communications today announced One-Shot Destination Entry technology on Microsoft’s Auto platform at Embedded World.

Nuance’s One-Shot Destination Entry is based on the latest version of Nuance VoCon 3200, the speech recognition engine that now supports new search algorithms to allow one-shot, multi-slot entry in just one spoken command. Instead of walking through a multi-step dialog and responding to independent prompts for city name, street name and house number, the user can simply speak the address in one shot, stating for example, “196 Sunset Boulevard, San Francisco, California.”

It’s like software understands, um, language

EU researchers have taken speech recognition to a whole new level by creating software that can understand spontaneous language. It will, like, make human-machine interaction, um, work a lot more, er, smoothly.
Automated speech recognition has revolutionised customer relations for banks, allowing them to respond quickly and with less staff to more low-level queries. It has helped to enable online banking and the development of more advanced private and public services because machines can handle routine matters, leaving people to take care of more serious issues.
But this technology has its limits. The most common, very basic, voice system asks a series of questions or offers a series of options, slowly and fitfully narrowing down your problem or supplying the solution. It would be nice to just tell the service what you want.
Soon, you can, thanks to the work of the Luna project, a European-wide effort to dramatically advance the power and intelligence of speech recognition. The team is moving the system from utterances – like ‘yes’, ‘no’, or ‘account’ – to spontaneous speech, such as ‘I want to get the balance on my current account.’ [click heading for more]


Saynergy: Lifelike Speech Interaction is Born

In a bid to improve lifelike speech interactions,
Vocalcom, Umanify and Loquendo have partnered to combine their respective technologies and create a digital assistant, Saynergy.

The new offering provides advanced natural language interaction and makes it possible for enterprises to improve user experiences with lifelike speech interactions that offer another level on interaction over the limited field of context and set commands of past systems.

For contact centers, this means the ability to manage open questions from their customers and helps solve complex questions through dialogue, just as with a live agent. [click heading for more]

Why can’t I learn a new language?

[nik's note: not about speech technology per se, but an interesting insight into human language acquisition]

People comprehend their native language with great speed and accuracy, and without visible effort. Indeed, our ability to perform linguistic computations is remarkable, especially when compared with other cognitive domains in which our computational abilities may be rather modest. My work deals with one aspect of language processing, namely, the identification of sounds, which is needed for subsequent word recognition. Sound recognition is a complex task, because the same sounds may be spoken differently depending on the speaker’s sex, age, pitch of the voice or mood. In addition, people may whisper or shout, be in a quiet room or a noisy street. All of these, and many other factors, lead to huge variation in individual acoustic instances of the same sound. It is precisely this acoustic variation that for decades has caused problems for computational linguists and speech engineers building automatic speech recognition systems. Humans, however, even five-year-olds, can successfully recognise sounds and words and under-stand what other people say almost instantly.

So what allows humans to be so efficient at sound recognition and how does that impact on our ability to learn a new language? [click heading for more]

Why We Won't Have Fully Conversational Robots

[nik's note: do you agree with the hypothesis in this article? Can we never replicate human capability at speech? ]

John Seabrook wrote a recent feature in The New Yorker about interactive-voice-response systems (I.V.R.) commonly used with customer service and tech support telephone hotlines. Seabrook spent time at B.B.N. Technologies watching these systems transcribe callers' words and analyzing the tone of voice for emotions present. While breaking down the history of automated telephone services and voice recognition innovations, he attempts to tackle the larger question of whether or not we can create a fully conversational, quasi-conscious robot, akin to 2001: A Space Odyssey's Hal 9000. Judging from the number of experts interviewed for the piece, the answer is a resounding no. [click heading for more]

Nationwide open-question call steering: some suggestions

This week sees a switch over to speech-based call steering for Nationwide Building Society, using open-question style prompting - "how may I help you?" (0800 30 20 10)

The opening prompt - somewhat long-winded - is as follows:

Welcome to nationwide. Calls may be recorded to help us improve our service to members. Briefly tell me in your own words what it is you are calling about and I'll direct your call to the right member of our team. You're free to interrupt at any time. For a list of available options, say" what are my choices"; so how can I help you?

I'm not sure I would have designed such a laborious prompt, even though there is clearly a lot of information to get across. Here's my suggestion (here comes the free consultancy).

Welcome to Nationwide. Calls may be recorded to help improve our service. At any time, just tell me in your own words what you are calling about; so, how can I help?

[then if there is silence]
For a list of available options, say "what are my choices?"; now, how can I help?

The original prompt is over-wordy and overly-formal - e.g. why specifically mention "members"? What's more, breaking the prompt in two like this obeys the principle of giving information just in time. In the original prompt, it is assumed the caller needs to know what choices are available, even though they've already been told they can use their own words. This clutters the prompt and increases cognitive load. In my version, the caller isn't told about this option until it appears they need it (by staying silent).