Rest those tired hands. Voice-recognition software finally works.

For decades, computer scientists have dreamed of computers that respond to human voice. But until recently speech-recognition systems could be a nightmare. New users had to recite long scripts to train the software to the peculiarities of their voices, and the software's translations could still be as mistake-prone as a first-year foreign-language student. But lately the technology has improved dramatically. Last summer Nuance Corp., the industry's big player, released a new version of Dragon that's winning raves. This year Microsoft included a voice-recognition feature in its new Vista operating system and dropped a reported $800 million to acquire a speech-software start-up called Tellme. Nuance and other companies—including Google—are working on systems that allow voice to replace the frenzied pecking on BlackBerrys and other mobile devices. "The technology has kind of snuck up on everyone," says Bill Meisel, publisher of Speech Strategy News.
PC-based voice recognition is different from the "call center" systems you encounter when calling banks or airlines. Telephone systems recognize only simple vocabularies and are designed to work with any voice. In contrast, PC-based systems adapt to a single user's speech, gaining accuracy over time. Nuance cites several reasons the software has improved lately. As more Dragon users began to have broadband connections, the company started remotely collecting data on the particular words and phrases that Dragon screwed up, allowing researchers to tweak their black-box algorithms to better target trouble spots. [click heading for more]

OnMobile acquires Telisma

OnMobile, India's Telecom Value Added Services (VAS) provider today announced that it has acquired 100% of the leading European Speech Recognition company, telisma.
The addition of telisma’s standards compliant speech recognition products & expertise will enable OnMobile to accelerate its penetration into fast growing emerging markets by developing new speech recognition language models. This technology enables quick and easy access to mobile applications and content and also strengthens OnMobile’s mobile applications product suite. [click heading for more]

Healthcare driving speech recognition technology growth

The automation of healthcare processes is the main driving force behind the growth of speech recognition technology, according to a report released by Datamonitor this week.
Healthcare currently represents 85% of the market for PC- and server-based speech recognition technologies.
“Patient information is gradually becoming digitised in order to address issues with delivering records and test results faster,” said Aphrodite Brinsmead, analyst at Datamonitor and author of the report. Speech recognition is also being used for medical transcription, easing pressure on transcriptionists and allowing healthcare providers to save on staffing costs. Medical transcription is estimated to be a multi-billion dollar market and speech recognition vendors are taking advantage of this. [click heading for more]

RadiSys Announces New Speech Capabilities for Convedia Media Server Family

RadiSys® Corporation (NASDAQ: RSYS) today announced that its market-leading Convedia® media server family now supports automatic speech recognition (ASR), or converting human speech to computer data, and text-to-speech (TTS) capabilities in multiple languages for IP contact center application developers and service providers. The company’s integration of a standards-based Media Resource Control Protocol (MRCP) with leading speech servers results in better resource utilization and economics, less equipment to procure and manage in large deployments, and improved scalability. [click heading for more]

Speech-Recognition Company "Travelling Wave" Goes After Small, Crowded Niche

Every once in awhile you find a company that has the odds stacked against them. TravellingWave, which occupies a three-room office just outside of downtown Seattle, has just five employees, including the founders, and is the highly competitive space of trying to figure out how to use voice recognition, which has historically been plagued by inaccuracies, as a way to input information into small devices. To boot, the company's list of competitors includes such giants such as Microsoft ( NSDQ: MSFT) and IBM, but also public companies like Nuance Communications. Founder and CEO Ashwin Rao said it perfectly: "We are in a small niche that's a crowded market. It's dumb unless we have a solid differentiator."
As a company with one year under its belt and a couple of unofficial years, Rao gave me a sneak peak of an announcement it plans to release today that may just be the thing that can put the company on the map. To date, it's big differentiator has been combining voice-recognition with some texting. For instance, when sending an SMS, a user would first speak the word "hello," and then hit the "h" key, which would bump up the software's accuracy. If a person came a long a word that proved more difficult, they would keep typing letters of the word until it was recognized. With one letter, the accuracy increases to 90 percent, with two letters, it's 95 percent, and with three it becomes 99 percent, they claim. In addition, they say people input three times faster with four times less key presses. [click heading for more]

BroadSoft Unveils RESTful APIs for Carrier-Grade Voice Application Mashups

[nik's note: is this the beginning of the end for voicexml ALREADY?? :-) ]

BroadSoft, Inc. today announced the availability of its Xtended Services Interface (Xsi), new RESTful application programming interfaces (APIs) that will allow Web developers to integrate BroadSoft's carrier-grade voice applications with unified communications solutions and Web-based business and consumer applications, such as Salesforce.com and Facebook.
The Xsi is the latest component to be announced as part of the BroadSoft® Xtended Program, BroadSoft's initiative for the creation of mashups that integrate BroadSoft's BroadWorks® VoIP platform with other applications that are already being used by millions. The RESTful-based Xsi allows subscriber and call resources to be accessed and used via HTTP and simple XML. This approach requires less client-side software to be written than other approaches and is becoming the overwhelming choice for developers to create Web applications. [click heading for more]

Why We Won't Have Fully Conversational Robots

[nik's note: do you agree with the hypothesis in this article? Can we never replicate human capability at speech? ]

John Seabrook wrote a recent feature in The New Yorker about interactive-voice-response systems (I.V.R.) commonly used with customer service and tech support telephone hotlines. Seabrook spent time at B.B.N. Technologies watching these systems transcribe callers' words and analyzing the tone of voice for emotions present. While breaking down the history of automated telephone services and voice recognition innovations, he attempts to tackle the larger question of whether or not we can create a fully conversational, quasi-conscious robot, akin to 2001: A Space Odyssey's Hal 9000. Judging from the number of experts interviewed for the piece, the answer is a resounding no. [click heading for more]

Vlingo's Speech Recognition Features Come to BlackBerry Devices

One of the world’s most popular mobile devices is getting an upgrade today as it integrates a broad suite of speech recognition functions developed by a Cambridge, Massachusetts-based company.

Starting today, Research In Motion’s BlackBerry devices are integrating with a voice-powered interface from vlingo, a technology that company officials say unlocks access to mobile phone wireless data services. [click heading for more]

Translation systems Speak up

WARS often boost technological development. In Iraq the armed forces have faced a shortage of translators, both from within their own ranks and from bilingual locals whose lives can be put in peril if they are found to be working for the foreigners. This has created a demand for machines that can translate between Arabic and English. Although some experimental devices have proved unreliable, they are now improving.
A number of two-way translating devices have been under development as part of the Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) programme run by the Defence Advanced Research Projects Agency, known as DARPA. There are three main participants: IBM, BBN Technologies and SRI International.
SRI said recently that it had sold 150 machines to the American government for use in Iraq. IBM has provided troops with 1,000 of its devices which run MASTOR, its multilingual automatic speech translator. Both systems can translate tens of thousands of words between Iraqi Arabic and American English, even when people are speaking outside the laboratory. [click heading for more]