YouTube Launches Auto-Captioning for Videos

Mike Cohen, part of Google’s Speech Technology team (as a note, he is also deaf), spoke via sign language to talk about his team’s work on video. This press conference is about YouTube and accessibility to the disabled, specifically the deaf. It’s also about YouTube’s new auto-captioning technology, which is rolling out to everybody today.

The Art (and Humour) of Error Messages

Error recovery strategies and the verbiage around them has always been a hot topic of debate. We’ve all heard the classical “I’m sorry I didn’t hear you.” and “I’m sorry I didn’t understand you.” messages that are normally implemented as global prefixes to further attempts to help users get back on track. Some other designers prefer to eliminate this generic approach and opt instead for a more context-sensitive alternative, where based on the possible cause of error, you could very well eliminate them completely and simply attempt to reprompt the user in a more natural way, with maybe a slight change in intonation to convey the meaning of “Hello, are you listening to me?” in a subtle way.

In regards to the content of the error messages themselves, we’ve all heard that they should not simply be repetitions of what the user has already heard, but rather slightly different variations based on the context and possible cause of the problem in the first place, so as to try to help them recover: is it due to a noisy environment? is the user providing me more information than I’m requesting? are they struggling to find it? do they need more time? are they getting confused by what I’m asking?, etc.

Of course, errors are nothing new and are particularly prevalent in the software and web world, where the value of the message and its ability to help users recover is very often dubious (or flat out ridiculous), resulting in bad user experiences. Some examples:

“Unknown Error -1″

“Keyboard error (press F1 to resume)”

“Wrong parameter”

“An unexpected error occurred, because an error of type – 110 occurred.”

“It is not necessary to dial 0 after the country code for this country.” (If they know that, why not simply recognize it, remove/ignore the 0 and move on?)

Some others here and here.

With that in mind, I have to say I found it very refreshing when my Firefox browser recently crashed and I was presented with the following message:

[for the full article]

Boring conversation? Let your computer listen for you

MOST of us talk to our computers, if only to curse them when a glitch destroys hours of work. Sadly the computer doesn't usually listen, but new kinds of software are being developed that make conversing with a computer rather more productive.

The longest established of these is automatic speech recognition (ASR), the technology that converts the spoken word to text. More recently it has been joined by subtler techniques that go beyond what you say, and analyse how you say it. Between them they could help us communicate more effectively in situations where face-to-face conversation is not possible.

ASR has come a long way since 1964, when visitors to the World's Fair in New York were wowed by a device called the IBM Shoebox, which performed simple arithmetic calculations in response to voice commands. Yet people's perceptions of the usefulness of ASR have, if anything, diminished.

"State-of-the-art ASR has an error rate of 30 to 35 per cent," says Simon Tucker at the University of Sheffield, UK, "and that's just very annoying." Its shortcomings are highlighted by the plethora of web pages poking fun at some of the mistakes made by Google Voice, which turns voicemail messages into text.

What's more, even when ASR gets it right the results can be unsatisfactory, as simply transcribing what someone says often makes for awkward reading. People's speech can be peppered with repetition, or sentences that just tail off.

"Even if you had perfect transcription of the words, it's often the case that you still couldn't tell what was going on," says Alex Pentland, who directs the Human Dynamics Lab at the Massachusetts Institute of Technology. "People's language use is very indirect and idiomatic," he points out.

Despite these limitations, ASR has its uses, says Tucker. With colleagues at Sheffield and Steve Whittaker at IBM Research in Almaden, California, he has developed a system called Catchup, designed to summarise in almost real time what has been said at a business meeting so the latecomers can... well, catch up with what they missed. Catchup is able to identify the important words and phrases in an ASR transcript and edit out the unimportant ones. It does so by using the frequency with which a word appears as an indicator of its importance, having first ruled out a "stop list" of very common words. It leaves the text surrounding the important words in place to put them in context, and removes the rest.

A key feature of Catchup is that it then presents the result in audio form, so the latecomer hears a spoken summary rather than having to plough through a transcript. "It provides a much better user experience," says Tucker.

In tests of Catchup, its developers reported that around 80 per cent of subjects were able to understand the summary, even when it was less than half the length of the original conversation. A similar proportion said that it gave them a better idea of what they had missed than they could glean by trying to infer it from the portion of the meeting they could attend.

One advantage of the audio summary, rather than a written one, is that it preserves some of the social signals embedded in speech. A written transcript might show that one person spoke for several minutes, but it won't reveal the confidence or hesitancy in their voice. These signals "can be more important than what's actually said", says Steve Renals, a speech technologist at the University of Edinburgh, UK, who was one of the developers of the ASR technology used by Catchup.

{follow the source link for more}

SpinVox carcass laid bare in final accounts

Dragon's Den TV star Julie Meyer described SpinVox as "the first major technology success story out of Europe", but the company's final accounts show a business running at a huge loss, spending heavily, and with interest payments alone exceeding income.

The accounts also show that CEO Christine Domecq repaid the company a six figure sum.

Speech giant Nuance acquired the controversial British company - which dominated the business pages last summer - shortly before Christmas in a stock deal.

Although its executives bravely talked of an IPO, SpinVox's liabilities far exceeded its assets. The company listed current liabilities of £124m, including trade and other payables of £59.6m and borrowings of £64.3m.

Yet SpinVox booked just £7.8m in revenue for the nine months year ending 30 September 2009, reporting a staggering loss of £56.49m. In the nine months ending 30 September 2008, accounts reveal, the company posted a £45.25m loss on income of just £2.97m.

The cost of doing business was high, with SpinVox buying customers. In June, the company announced a deal with Telefonica to provide text-to-speech voicemail in 13 Latin American countries.

The accounts refer to an "intangible asset of £22.2m, in respect of the right to provide its service to a customer". This was to be amortized over the term of the deal. But the accounts added that "since the contract is at an early stage of deployment, management consider it reasonably possible that the net revenue under the contract may be zero".

{follow source reference below for more}

Nuance Study Finds Automated, Live Agent Preferences

Nuance Communications has announced the findings of a commissioned study conducted by Forrester Consulting on behalf of Nuance titled, “Driving Consumer Engagement with Automated Telephone Customer Service.”  

It found that consumers rate automated telephone customer service higher than live agents for certain straightforward interactions. 'In five out of ten posed scenarios, consumers preferred automated telephone customer service systems over live agent interactions for tasks like prescription refills, checking the status of a flight from a cell phone, checking account balances, store information requests and tracking shipments.

Consumers’ satisfaction with customer service leaves a lot of room for improvement, too, the study found: 'Only 49 percent of U.S. online adults report being satisfied, very satisfied or extremely satisfied with companies’ customer service in general.' 

And we're just used to it by now: Automated telephone systems are 'an expected and accepted customer service channel,' the survey found, with 82 percent of US online adults having used an automated Touchtone or speech recognition system to contact customer service in the past 12 months.  

Spinvox bought by Nuance for £64m

UK firm Spinvox, which converts voicemails into texts, has been bought by speech recognition company Nuance for $102.5m (£64m).

The deal is worth $66m in cash and $36.5m in stock, about a third of the previously rumoured $146m price tag.

Spinvox investor Invesco Perpetual had confirmed in September that Spinvox was up for sale.

In recent months doubts had been cast on how effective Spinvox's speech-to-text software actually was.

The company claims to use advanced voice recognition software for its service, but the BBC found that human operators were also involved in transcribing many messages.

Voice-to-text firm Spinvox given time to repay loan

The troubled voice-to-text technology firm Spinvox has been given more time to pay back a loan which threatened its survival.

Spinvox is in negotiations with an American speech recognition firm Nuance which may lead to a sale.

The £30m loan, which was due to be repaid this week, was granted earlier this year by an investor.

Now the company has been told that it has until the end of January to find the money.

A source close to the company said Spinvox had been told: "We're not going to put the company into administration over the £30 million."

[source: http://news.bbc.co.uk/1/hi/technology/8411961.stm ]

Dragon Dictation comes to the iPhone

[source http://www.tuaw.com/2009/12/08/dragon-dictation-comes-to-the-iphone-wow/ ]

Put this into the 'I didn't think they could ever get this to work on an iPhone' category.

I'm talking about Dragon Dictation [iTunes link] from Nuance, the developers of the very popular Dragon Naturally Speaking for the PC. Nuance also provides the speech recognition engine for MacSpeech Dictate on the Mac platform.

To dictate on the iPhone you just launch the app, press the record button, and start talking. Your dictation can be a brief sentence, or a much longer treatise. Once the text has been created from your speech, it's possible to email it, send it as a text message, or put the result in your clipboard. After recording your message, you can edit the resulting text before you send it off for others to read.

It's pretty slick! When you record your message, it is quickly transmitted to Nuance servers where a speech recognition algorithm is run against your data. The resulting text is returned to your iPhone very quickly; my informal benchmarks showed that it took about a second for text to be processed on a Wi-Fi network, and less than 5 seconds over 3G. You'll need a data connection for the app to work, but having this speech-to-text capability is going to be very important to a lot of people, who will find all sorts of uses for it.

Humans 'hear' through their skin

Sensations on the skin play a part in how people hear speech, say Canadian researchers.

A study found that inaudible puffs of air delivered alongside certain sounds influenced what participants thought they were listening to.

Writing in the journal Nature, the team said the findings showed that audio and visual clues were not the only important factors in how people hear.

The findings may lead to better aids for the hard of hearing, experts said.

It is already well known that visual cues from a speaker's face can enhance or interfere with how a person hears what is being said.

In the latest study, researchers at the University of British Columbia in Vancouver wanted to look at whether tactile sensations also affected how sounds are heard.

They compared sounds which when spoken are accompanied by a small inaudible breath of air, such as "pa" and "ta" with sounds which do not such as "ba" and "da".

At the same time, participants were given - or not - a small puff of air to the back of the hand or the neck.

They found that "ba" and "da", known as unaspirated sounds, were heard as the aspirated equivalents, "pa" and "ta", when presented alongside the puff of air.

[source: BBC - see references]

Tweetrad.io: Listen to Twitter Search Results

Here’s something fun and amusing. We just happened upon Tweetrad.io, a site that treats tweets from your Twitter searches like tunes on an old school radio channel, so tweets are read aloud by an automated Twitter DJ as they roll in.

The site is self-explanatory, as you can search for tweets, or select from the pre-programmed channels or trending topics, and listen to the tweets instead of having to waste the effort of actually reading them. It may sound a little a dry in theory, but in practice, it’s absolutely hilarious.