Boring conversation? Let your computer listen for you

MOST of us talk to our computers, if only to curse them when a glitch destroys hours of work. Sadly the computer doesn't usually listen, but new kinds of software are being developed that make conversing with a computer rather more productive.

The longest established of these is automatic speech recognition (ASR), the technology that converts the spoken word to text. More recently it has been joined by subtler techniques that go beyond what you say, and analyse how you say it. Between them they could help us communicate more effectively in situations where face-to-face conversation is not possible.

ASR has come a long way since 1964, when visitors to the World's Fair in New York were wowed by a device called the IBM Shoebox, which performed simple arithmetic calculations in response to voice commands. Yet people's perceptions of the usefulness of ASR have, if anything, diminished.

"State-of-the-art ASR has an error rate of 30 to 35 per cent," says Simon Tucker at the University of Sheffield, UK, "and that's just very annoying." Its shortcomings are highlighted by the plethora of web pages poking fun at some of the mistakes made by Google Voice, which turns voicemail messages into text.

What's more, even when ASR gets it right the results can be unsatisfactory, as simply transcribing what someone says often makes for awkward reading. People's speech can be peppered with repetition, or sentences that just tail off.

"Even if you had perfect transcription of the words, it's often the case that you still couldn't tell what was going on," says Alex Pentland, who directs the Human Dynamics Lab at the Massachusetts Institute of Technology. "People's language use is very indirect and idiomatic," he points out.

Despite these limitations, ASR has its uses, says Tucker. With colleagues at Sheffield and Steve Whittaker at IBM Research in Almaden, California, he has developed a system called Catchup, designed to summarise in almost real time what has been said at a business meeting so the latecomers can... well, catch up with what they missed. Catchup is able to identify the important words and phrases in an ASR transcript and edit out the unimportant ones. It does so by using the frequency with which a word appears as an indicator of its importance, having first ruled out a "stop list" of very common words. It leaves the text surrounding the important words in place to put them in context, and removes the rest.

A key feature of Catchup is that it then presents the result in audio form, so the latecomer hears a spoken summary rather than having to plough through a transcript. "It provides a much better user experience," says Tucker.

In tests of Catchup, its developers reported that around 80 per cent of subjects were able to understand the summary, even when it was less than half the length of the original conversation. A similar proportion said that it gave them a better idea of what they had missed than they could glean by trying to infer it from the portion of the meeting they could attend.

One advantage of the audio summary, rather than a written one, is that it preserves some of the social signals embedded in speech. A written transcript might show that one person spoke for several minutes, but it won't reveal the confidence or hesitancy in their voice. These signals "can be more important than what's actually said", says Steve Renals, a speech technologist at the University of Edinburgh, UK, who was one of the developers of the ASR technology used by Catchup.

{follow the source link for more}

Nuance and IBM to Offer Speech Technologies for Ten Industries

 

Nuance & IBM have announced an agreement they say will help further accelerate innovation in speech recognition solutions for enterprises, consumers and partners worldwide.

As part of this agreement, Nuance has been chosen as IBM’s Preferred Business Partner for speech technologies and related professional services and will complement IBM’s Industry Solutions portfolio.

The two companies will focus on deploying speech technologies for ten industries: automotive, banking, electronics, energy and utilities, healthcare and life sciences, insurance, media and entertainment, retail, telecommunications, and travel and transportation.

 

Update: Nuance acquires parts of IBM's speech technology

Nuance recently announced that it will be purchasing some of IBM's speech technology. While the addition of IBM's source code will enable Nuance to make improvements to its embedded and network-based speech recognition technology, the acquisition and ensuing relationship has prompted questions over Nuance's technology and IBM's motives.

Nuance recently announced that it has acquired parts of IBM's speech recognition technology; namely, the source code from IBM's research and development team, which will enhance its speech capabilities in the areas of network-based and embedded text-to-speech (TTS), and advanced speech recognition (ASR). Nuance intends to combine the source code with its own over the next two years to improve the performance of its speech recognition engine.

Despite initial speculation that IBM will no longer compete in this market, the company will continue to develop its speech capabilities independently in these areas. It has sold Nuance a past release of its code for its embedded ViaVoice software and its WebSphere Voice Server middleware. The key motive for IBM in making this transaction is to gain some return on investment for its speech recognition technology, which is not unusual as it regularly sells patent licenses to other vendors.

The purchase of IBM's technology reinforces Nuance's aims to develop leading speech technology. However, it has also led to speculation that IBM's technology was in fact superior to Nuance's; if true, Nuance's decision to acquire this technology was a prudent one.

 

Nuance acquires parts of IBM's speech technology

Nuance recently announced that it will be purchasing some of IBM's speech technology. While the addition of IBM's source code will enable Nuance to make improvements to its embedded and network-based speech recognition technology, the acquisition and ensuing relationship has prompted questions over Nuance's technology and IBM's motives.

Nuance recently announced that it has acquired parts of IBM's speech recognition technology; namely, the source code from IBM's research and development team, which will enhance its speech capabilities in the areas of network-based and embedded text-to-speech (TTS), and advanced speech recognition (ASR). Nuance intends to combine the source code with its own over the next two years to improve the performance of its speech recognition engine.

Despite initial speculation that IBM will no longer compete in this market, the company will continue to develop its speech capabilities independently in these areas. It has sold Nuance a past release of its code for its embedded ViaVoice software and its WebSphere Voice Server middleware. The key motive for IBM in making this transaction is to gain some return on investment for its speech recognition technology, which is not unusual as it regularly sells patent licenses to other vendors.

[click heading for more]

IBM's next five big things

[nik's note: this is the original author's opinion, not mine (which may or may not agree)]

This press release from IBM claims that speech recognition for the Web will be a hot technology within five years. As I mentioned previously, IBM has been dedicating some effort to speech recognition. It's a nice idea, I'd like to see it happen, but it's usually a good idea to be skeptical of claims for any speech recognition application. [click heading for more]

Tired of typing? Make yourself heard



Voice-recognition software is evolving into a $1 billion market. It has long been used for automated customer service attendants, simple data entry, and dictation.
Now, language response systems are embedded in cellphones and global positioning systems, as well as toys (a diary that opens when you say the password.) And voice controlled lamps, clocks, and remote controls have been introduced in the home electronics arena.


With only a few brands on the market - IBM, Microsoft, and the leader, Nuance-Dragon -voice-recognition software has made huge leaps, with better algorithms, advances in processing power, and improved microphones. [more...]

IBM strives for super-human speech recognition

We recently caught up with Dr. David Nahamoo, IBM’s speech technology guru to hear about what he calls “super-human speech recognition.” No, he’s not talking about Spidey or Superman, but rather a project meant to substantially improve the quality, dialog management, and usability of speech technology by the end of the decade — for dictation, call centers, cars, and a broad set of other applications with embedded computing power. One of his goals is to surpass a human for real-time dictation such as a lecture, phone conversation, or broadcast — and he would like to do that for 50 languages with the same computer.

Before fully speech-enabled applications become ubiquitous, Nahamoo says that the technology must cross a simplicity threshold that would open it up to more developers. The speech recognition community converged around Voice XML about 5 years ago, thereby abandoning proprietary interfaces. Nahamoo feels that the next step is for providers like IBM need to encapsulate design principles and behaviors in templates. Sound familiar? Like client/server and web development tools (think Visual Basic and Dreamweaver, respectively), we’d say that speech needs its own GUI-based development environment.