50th Anniversary of STD (Subscriber Trunk Dialling)

[nik's note: source BBC]

It is 50 years since the first long-distance telephone call was made in the UK without the help of an operator.
The 1958 call was made by the Queen to the Lord Provost of Edinburgh from the central telephone exchange in Bristol.She started her call by saying: "This is the Queen speaking from Bristol. Good afternoon, my Lord Provost."
 [click heading for video]

listen to the sound of your typing

[nik's note: This is a fascinating piece of research work - not directly related to speech applications, but using speech recognition technology (amongst other things) to listen to typing and figure out what's being typed. Definitely worthy of inclusion in a hollywood plot somewhere along the line... ]

We examine the problem of keyboard acoustic emanations. We present a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard, and then recovering up to 96% of typed characters. There is no need for a labeled training recording. Moreover the recognizer bootstrapped this way can even recognize random text such as passwords: In our experiments, 90% of 5-character random passwords using only letters can be generated in fewer than 20 attempts by an adversary; 80% of 10- character passwords can be generated in fewer than 75 attempts. Our attack uses the statistical constraints of the underlying content, English language, to reconstruct text from sound recordings without any labeled training data. The attack uses a combination of standard machine learning and speech recognition techniques, including cepstrum features, Hidden Markov Models, linear classification, and feedback-based incremental learning. [click heading for more]

A computer can pick out speech even amid cacophony


Using a recent development in speech recognition, it is possible to search through television news programmes provided the recognition system has been trained beforehand. PhD candidate Marijn Huijbregts from the University of Twente (Netherlands) has, however, taken things even further: he has developed Spoken Document Retrieval for audio and video files that the speech recognition system has not yet been trained to deal with [click heading for more]

Speech Recognition Remote Control


Ever feel like manually pressing the buttons on that remote control is too much work?
Well, fear not, Gentle Reader: Things are about to get a whole lot easier.
Oki Japan and a team from Waseda University are working on a speech recognition powered remote control that pulls a single target voice from background noise and the voices of other speakers.
[click heading for more]

It's cloudy; and so the future's bright

[nik's note: Alongside speech techology I have a strong interest in cloud computing - particular its disruptive force in the industry and how it is changing the landscape. I've been a cloud user since we began to think of the concept, having used amazon's infrastructure to build my own webstores and the like. It's something we now must take very seriously in the speech industry, as it lowers barriers to entry and as several players are showing, networked speech engines allow modest devices to perform like high-end desktops. Here's a little something I picked up from the powered-by-cloud blog]

Five reasons the cloud is for real.

The technology behind cloud computing is not brand new. If that is so, why the hype about cloud computing?

Look at the five reasons Alistair Croll of Gigaom cites.

Power and cooling are expensive. It costs far more to run computers than it does to buy them.To save on power, we’re building data centers near dams; for cooling, we’re considering using decommissioned ships. This is about economics and engineering.
Demand is global. Storage itself may be cheap, but data processing at scale is hard to do. With millions of consumers using a service, putting data next to computing is the only way to satisfy them.
Computing is ubiquitous. Keeping applications and content on a desktop isn’t just old-fashioned — it’s inconvenient.
Applications are built from massive, smart parts. Clouds give developers building blocks they couldn’t build themselves, from storage to authentication to friend feeds to CRM interfaces, letting coders stand on the shoulders of giants.
Clouds let us experiment. By removing the cost of staging an environment, a cloud lets a company try new things faster. Billing on demand the cloud means anyone can experiment.

[click heading for more]

Lost in translation: The iPhone's accents problem

[nik's note: this is funny]

SEASONAL scene somewhere in Scottish theatre land:
"Whit did ye get her fir Christmas?"

"Ah firgoat."

"Ye firgoat? Aw, did she gie ye hell?"

"Eh?"

"Well ye said ye firgoat"

"Naw, ah fir goat."

And so the pantomime joke continues, for as long as the colourfully clad dames can draw it out. Eventually the smaller, fatter ugly sister will understand that her taller, scrawnier stage sibling has given a present of a fur coat. But the confusion inevitably won't end there. "Whit fir?" The answer: "Fir tae keep her warm." Obviously. 

The Google application for iPhone, which was developed in the US, is supposed to allow users to search for information by recognising the words they say. Unfortunately, there have been some serious transatlantic translation glitches.
[click heading for more]

Speech recognition: Eckoh plc

Early last year Nik Philpott and Adam Maloney found themselves at the centre of a media storm over potentially damaging claims that their premium rate telephony services had short changed callers to TV shows including Channel 4’s ‘Richard and Judy’. 


Two years on and the chief executive and finance director of AIM listed Eckoh plc have breathed new life into a business that endured a PR disaster which at its height saw it pilloried in the press virtually every day. 

Indeed, for a time it wasn’t just Eckoh’s reputation on the line but the entire market for call-ins and competitions that had hooked media owners been scrambling to find ways of propping up flagging ad revenues.
[click heading for more]

M*Modal Announces Real-Time Speech Understanding Application for Healthcare Professionals Available on Apple App Store

While clinical dictation was possible before via telephone or PDA devices, the resulting report was not available for hours. AnyModal CDS Mobile delivers documents in real time to the physician's iPhone, making it the first device to capture, understand and transcribe dictation in real-time. Physicians can now immediately review and sign off on a clinical document.

The company's core product, AnyModal(TM) CDS (Conversational Documentation Services), based on a unique combination of proprietary speech recognition and natural language understanding technologies, turns clinical dictation directly into structured and encoded clinical documents.
The highly configurable, service-oriented architecture (SOA), paired with the platform independent set of thin client user interface components, enables M*Modal partners to incorporate AnyModal CDS into their workflow solutions.

[click heading for more]

Google Voice Search Turns 'Fish' To 'Sex'

Users of the latest release of Google Mobile App for the iPhone have complained that the voice-recognition program doesn't understand some British accents.


Users posting on the Google Mobile Blog have complained that Google's app doesn't understand some British accents, a claim that isn't entirely surprising given the use of English subtitles on certain television shows imported from the United Kingdom to the United States.

"Awesome job, Google," wrote someone posting under the name Kevin. "Only problem is every time I say the word 'fish' it registers as 'sex.' " 
[click heading for more]