Google finding its voice
Friday, September 3, 2010 at 8:38AM

The speech technology blog: news, views and reviews of the speech recognition market, speech technology industry, voiceXML landscape and world of IVR and voice self-service; with a smattering of interaction, gadgetry and social media.
Let's Talk Speech: I am now at Verizon Business, consulting in my specialist areas of First-Contact Call Resolution solutions, Speech Self-Service and Customer Service interaction and experience. Feel free to contact me about industry news, swap opinions or discuss consultancy services and customer service strategy.
I'm an 'imagineer' with almost two decades behind me. Much of my career was spent in BT's Speech Technology and CRM divisions, designing voice solutions, inspiring customers, and creating new products and propositions, and generally evangelising about the role of voice self-service and improved customer service interaction. I now work in a consultancy role specialising in Communications Strategy, Contact Centres, Customer Service, Social Media and practically anything to do with improved customer service.
Friday, September 3, 2010 at 8:38AM
Monday, August 2, 2010 at 9:49AM Nuance comes in with Dragon NaturallySpeaking 11, a new release of its speech recognition software, one that has been around for 13 years, that has been redesigned to let people spend more of their energy working and creating, rather than clicking and typing. Dragon 11 says Nuance, “gives people a voice to perform almost any task on the computer to create documents, send e-mails, surf the Web, search Facebook (News - Alert) and Twitter and interact with their favorite applications – at speeds up to three times faster than typing.”
(for more see source reference)
Thursday, March 4, 2010 at 6:35PM Mike Cohen, part of Google’s Speech Technology team (as a note, he is also deaf), spoke via sign language to talk about his team’s work on video. This press conference is about YouTube and accessibility to the disabled, specifically the deaf. It’s also about YouTube’s new auto-captioning technology, which is rolling out to everybody today.
Thursday, February 25, 2010 at 3:32PM Error recovery strategies and the verbiage around them has always been a hot topic of debate. We’ve all heard the classical “I’m sorry I didn’t hear you.” and “I’m sorry I didn’t understand you.” messages that are normally implemented as global prefixes to further attempts to help users get back on track. Some other designers prefer to eliminate this generic approach and opt instead for a more context-sensitive alternative, where based on the possible cause of error, you could very well eliminate them completely and simply attempt to reprompt the user in a more natural way, with maybe a slight change in intonation to convey the meaning of “Hello, are you listening to me?” in a subtle way.
In regards to the content of the error messages themselves, we’ve all heard that they should not simply be repetitions of what the user has already heard, but rather slightly different variations based on the context and possible cause of the problem in the first place, so as to try to help them recover: is it due to a noisy environment? is the user providing me more information than I’m requesting? are they struggling to find it? do they need more time? are they getting confused by what I’m asking?, etc.
Of course, errors are nothing new and are particularly prevalent in the software and web world, where the value of the message and its ability to help users recover is very often dubious (or flat out ridiculous), resulting in bad user experiences. Some examples:
“Unknown Error -1″
“Keyboard error (press F1 to resume)”
“Wrong parameter”
“An unexpected error occurred, because an error of type – 110 occurred.”
“It is not necessary to dial 0 after the country code for this country.” (If they know that, why not simply recognize it, remove/ignore the 0 and move on?)
With that in mind, I have to say I found it very refreshing when my Firefox browser recently crashed and I was presented with the following message:
Friday, February 12, 2010 at 8:52PM MOST of us talk to our computers, if only to curse them when a glitch destroys hours of work. Sadly the computer doesn't usually listen, but new kinds of software are being developed that make conversing with a computer rather more productive.
The longest established of these is automatic speech recognition (ASR), the technology that converts the spoken word to text. More recently it has been joined by subtler techniques that go beyond what you say, and analyse how you say it. Between them they could help us communicate more effectively in situations where face-to-face conversation is not possible.
ASR has come a long way since 1964, when visitors to the World's Fair in New York were wowed by a device called the IBM Shoebox, which performed simple arithmetic calculations in response to voice commands. Yet people's perceptions of the usefulness of ASR have, if anything, diminished.
"State-of-the-art ASR has an error rate of 30 to 35 per cent," says Simon Tucker at the University of Sheffield, UK, "and that's just very annoying." Its shortcomings are highlighted by the plethora of web pages poking fun at some of the mistakes made by Google Voice, which turns voicemail messages into text.
What's more, even when ASR gets it right the results can be unsatisfactory, as simply transcribing what someone says often makes for awkward reading. People's speech can be peppered with repetition, or sentences that just tail off.
"Even if you had perfect transcription of the words, it's often the case that you still couldn't tell what was going on," says Alex Pentland, who directs the Human Dynamics Lab at the Massachusetts Institute of Technology. "People's language use is very indirect and idiomatic," he points out.
Despite these limitations, ASR has its uses, says Tucker. With colleagues at Sheffield and Steve Whittaker at IBM Research in Almaden, California, he has developed a system called Catchup, designed to summarise in almost real time what has been said at a business meeting so the latecomers can... well, catch up with what they missed. Catchup is able to identify the important words and phrases in an ASR transcript and edit out the unimportant ones. It does so by using the frequency with which a word appears as an indicator of its importance, having first ruled out a "stop list" of very common words. It leaves the text surrounding the important words in place to put them in context, and removes the rest.
A key feature of Catchup is that it then presents the result in audio form, so the latecomer hears a spoken summary rather than having to plough through a transcript. "It provides a much better user experience," says Tucker.
In tests of Catchup, its developers reported that around 80 per cent of subjects were able to understand the summary, even when it was less than half the length of the original conversation. A similar proportion said that it gave them a better idea of what they had missed than they could glean by trying to infer it from the portion of the meeting they could attend.
One advantage of the audio summary, rather than a written one, is that it preserves some of the social signals embedded in speech. A written transcript might show that one person spoke for several minutes, but it won't reveal the confidence or hesitancy in their voice. These signals "can be more important than what's actually said", says Steve Renals, a speech technologist at the University of Edinburgh, UK, who was one of the developers of the ASR technology used by Catchup.
{follow the source link for more}
IBM,
transcription in
news
Friday, February 12, 2010 at 8:47PM Dragon's Den TV star Julie Meyer described SpinVox as "the first major technology success story out of Europe", but the company's final accounts show a business running at a huge loss, spending heavily, and with interest payments alone exceeding income.
The accounts also show that CEO Christine Domecq repaid the company a six figure sum.
Speech giant Nuance acquired the controversial British company - which dominated the business pages last summer - shortly before Christmas in a stock deal.
Although its executives bravely talked of an IPO, SpinVox's liabilities far exceeded its assets. The company listed current liabilities of £124m, including trade and other payables of £59.6m and borrowings of £64.3m.
Yet SpinVox booked just £7.8m in revenue for the nine months year ending 30 September 2009, reporting a staggering loss of £56.49m. In the nine months ending 30 September 2008, accounts reveal, the company posted a £45.25m loss on income of just £2.97m.
The cost of doing business was high, with SpinVox buying customers. In June, the company announced a deal with Telefonica to provide text-to-speech voicemail in 13 Latin American countries.
The accounts refer to an "intangible asset of £22.2m, in respect of the right to provide its service to a customer". This was to be amortized over the term of the deal. But the accounts added that "since the contract is at an early stage of deployment, management consider it reasonably possible that the net revenue under the contract may be zero".
{follow source reference below for more}
Friday, January 15, 2010 at 10:10AM Nuance Communications has announced the findings of a commissioned study conducted by Forrester Consulting on behalf of Nuance titled, “Driving Consumer Engagement with Automated Telephone Customer Service.”
It found that consumers rate automated telephone customer service higher than live agents for certain straightforward interactions. 'In five out of ten posed scenarios, consumers preferred automated telephone customer service systems over live agent interactions for tasks like prescription refills, checking the status of a flight from a cell phone, checking account balances, store information requests and tracking shipments.
Consumers’ satisfaction with customer service leaves a lot of room for improvement, too, the study found: 'Only 49 percent of U.S. online adults report being satisfied, very satisfied or extremely satisfied with companies’ customer service in general.'
And we're just used to it by now: Automated telephone systems are 'an expected and accepted customer service channel,' the survey found, with 82 percent of US online adults having used an automated Touchtone or speech recognition system to contact customer service in the past 12 months.
Wednesday, December 30, 2009 at 3:21PM UK firm Spinvox, which converts voicemails into texts, has been bought by speech recognition company Nuance for $102.5m (£64m).
The deal is worth $66m in cash and $36.5m in stock, about a third of the previously rumoured $146m price tag.
Spinvox investor Invesco Perpetual had confirmed in September that Spinvox was up for sale.
In recent months doubts had been cast on how effective Spinvox's speech-to-text software actually was.
The company claims to use advanced voice recognition software for its service, but the BBC found that human operators were also involved in transcribing many messages.
acquisition,
nuance,
spinvox in
news
Tuesday, December 15, 2009 at 8:28AM The troubled voice-to-text technology firm Spinvox has been given more time to pay back a loan which threatened its survival.
Spinvox is in negotiations with an American speech recognition firm Nuance which may lead to a sale.
The £30m loan, which was due to be repaid this week, was granted earlier this year by an investor.
Now the company has been told that it has until the end of January to find the money.
A source close to the company said Spinvox had been told: "We're not going to put the company into administration over the £30 million."
[source: http://news.bbc.co.uk/1/hi/technology/8411961.stm ]
Tuesday, December 8, 2009 at 8:14AM [source http://www.tuaw.com/2009/12/08/dragon-dictation-comes-to-the-iphone-wow/ ]
Put this into the 'I didn't think they could ever get this to work on an iPhone' category.
I'm talking about Dragon Dictation [iTunes link] from Nuance, the developers of the very popular Dragon Naturally Speaking for the PC. Nuance also provides the speech recognition engine for MacSpeech Dictate on the Mac platform.
To dictate on the iPhone you just launch the app, press the record button, and start talking. Your dictation can be a brief sentence, or a much longer treatise. Once the text has been created from your speech, it's possible to email it, send it as a text message, or put the result in your clipboard. After recording your message, you can edit the resulting text before you send it off for others to read.
It's pretty slick! When you record your message, it is quickly transmitted to Nuance servers where a speech recognition algorithm is run against your data. The resulting text is returned to your iPhone very quickly; my informal benchmarks showed that it took about a second for text to be processed on a Wi-Fi network, and less than 5 seconds over 3G. You'll need a data connection for the app to work, but having this speech-to-text capability is going to be very important to a lot of people, who will find all sorts of uses for it.