Voice enabling XML, Part 1: Develop a voice-enabled RSS reader

RSS is a hot topic these days, as it provides an easy way to stream data online. This article, the first of a four-part series on developing VoiceXML applications, shows you how to develop a voice-enabled RSS reader. The input to the application is RSS data, and the output is VoiceXML that can be read and spoken by your favorite compatible voice application.

Speech recognition on a handheld

Now not only can Windows on the PC comprehend 8 languages, but so can Windows on the mobile phone, or at least one anyway.

Recently the Live Search for Mobile team introduced voice input in their application for Windows Mobiles. And it works remarkably well according to some early responses. [Check out the video of the demo by clicking on this post title.]

Nuance Joins Open Handset Alliance


In its continuing quest to be the leader in mobile speech applications, the company today announced that it has joined the Open Handset Alliance – a new wireless industry group with a unified goal of giving developers all over the world a chance to deliver customized mobile applications that will revolutionize the mobile experience via a single, open platform.

Convergys' Work with USPS Pays Off

Convergys Corporation, a global provider of customer care, human resources, and billing services, has announced it has received the Market Leader Award from Speech Technology Magazine, recognizing Convergys as a leader in the consulting services category. The company’s work with the United States Postal Services was of particular interest.

USPS has received several important call automation enhancements as a result of Convergys implementations. These include speech recognition and tuning to increase the number of calls resolved through automated applications. With these changes, USPS has experienced a 30 percent increase in call containment rates in a system that handles more than 60 million calls a year. One specific internal speech application created more than $10 million in annual savings and cut misdirected calls from 21,000 to four or less per month.

Google's call [forbes]



The long-awaited Googlephone is here--sort of. An actual hold-in-your-hand phone is six to 12 months away. Instead, all the hoopla and announcements today are about unveiling tools for building future phones.


The list of companies that have signed up for the Open Handset Alliance says a lot about the Google-led phone effort: There are few market leaders but a lot of companies hoping to challenge the top dogs. Many of the software companies are little known outside their industry niche. They clearly hope that the Google-led effort will blow away the barriers to entering the mobile-phone business and give them a chance to build prosperous businesses in the emerging market for Internet-friendly phones.

Tired of typing? Make yourself heard



Voice-recognition software is evolving into a $1 billion market. It has long been used for automated customer service attendants, simple data entry, and dictation.
Now, language response systems are embedded in cellphones and global positioning systems, as well as toys (a diary that opens when you say the password.) And voice controlled lamps, clocks, and remote controls have been introduced in the home electronics arena.


With only a few brands on the market - IBM, Microsoft, and the leader, Nuance-Dragon -voice-recognition software has made huge leaps, with better algorithms, advances in processing power, and improved microphones. [more...]

mr.nik's opinion: The world-wide widget: Voice Web 2.0

The first time I set up a page on 'Myspace' I was sorely unmoved. Frankly, it appeared not much different from something us web 1.0 oldies remember from a dozen years ago: Geocities. A place where you could create your free online presence; and millions did. The difference then was that you had total flexibility - a blank page with a remit to create your own garish HTML. Many tried, few succeeded very well.

Myspace is, of course, different. You are actually far more restricted with what you can do, but at the same time you're provided with a set of functions that let you connect your page with those of anyone else you choose. The other difference is, we now have a generation of web 2.0 teenagers growing up who have embraced web 2.0 and never heard or seen web 1.0. So, while they continue to express themselves and stamp out their identity as strongly as their counterparts did a dozen years ago, there is no knowledge of what is under the bonnet; indeed there is no need to know; just configure the building blocks. This process is trivial and consequently this 'social' web is richer and more dynamic than its forefather, even if it is just as aesthetically ugly to look at.

Of course, facebook has come along to make amends - slick, tidy and neat it appeals to those of us (perhaps a little older) with less of an urge to look quite so flashingly-neon and radically different from the crowd. But the key points remain: the open interface and the ease with which any of thousands of pre-built applications can be deployed and customised breathes incredible life and variety into everyones' own personal corner of the web.

This is the incredible power of open standards (such as XML) finally coming home to roost: the ability to incorporate gadgets, widgets and chicklets into any online home, to create an effervescent, engaging and almost living web experience, no matter who you are. This is the web creating the web; the web's users rather than its architects shaping the experience they want themselves and others to enjoy. Building with blocks rather than creating them.

So, will this ever happen with Voice? Yes, I'm pretty sure it will - or at least I'm pretty sure it ought to.

On the whole we are still at the stage of dabbling - halfway between Geocities and Facebook - with organisations starting to deploy and offer 'packaged applications' and speech vendors starting to ship 'components' that can be used as a leg-up in complex applications. But it still feels very much like cutting and pasting the code of web 1.0 rather than organising the widgets of web 2.0. Still making the bricks rather than building with them.

Part of the reason is that voice tools and interfaces and platforms are not yet sufficiently mature to fully embrace this way of thinking - although leading service creation environments, such as Vicorp's xMP, are successfully pushing the boundaries. But the other reason is that Voice Applications (unlike Myspace pages) are not in the hands of teenagers, but in the hands of their parents and grandparents in the form of IT Managers and Customer-Service directors, undergoing a rather conservative osmosis by evolution, not revolution.

But I do dream of the day when the 'voice web' (as Nuance once called it) will come of age. I will create my voice-enabled application 'mashup' as easily as I log onto blogger and create a new blog post, or add a new gadget to iGoogle. I will click save to deploy it instantly on somebody's hosting platform, or perhaps even better, somewhere in my telco's enormous network. To be honest, I don't care and I don't need to - this is the power of open standards. If I need third party content, I'll drop in a widget or two that provides it. Most of what I'll actually do is provide content and engagement and not worry about how to make the whole thing work.

Perhaps this sounds like the sort of thing that doesn't really need to happen in the business environment (where voice applications currently live and breathe and are far too mission-critical). Perhaps "voice for the masses" is an imaginary product that doesn't have an audience. I'm not so sure. Our "connected world" is becoming engrained in our cultural DNA. From teenagers who want to create their own voicemail service, to Customer-Satisfaction directors who want to outshine the competition, I think we're destined to hear a lot more about Voice.

IBM strives for super-human speech recognition

We recently caught up with Dr. David Nahamoo, IBM’s speech technology guru to hear about what he calls “super-human speech recognition.” No, he’s not talking about Spidey or Superman, but rather a project meant to substantially improve the quality, dialog management, and usability of speech technology by the end of the decade — for dictation, call centers, cars, and a broad set of other applications with embedded computing power. One of his goals is to surpass a human for real-time dictation such as a lecture, phone conversation, or broadcast — and he would like to do that for 50 languages with the same computer.

Before fully speech-enabled applications become ubiquitous, Nahamoo says that the technology must cross a simplicity threshold that would open it up to more developers. The speech recognition community converged around Voice XML about 5 years ago, thereby abandoning proprietary interfaces. Nahamoo feels that the next step is for providers like IBM need to encapsulate design principles and behaviors in templates. Sound familiar? Like client/server and web development tools (think Visual Basic and Dreamweaver, respectively), we’d say that speech needs its own GUI-based development environment.

Look, your Honour! - no hands



As technology revolutionizes the sleepy world of transcription, the traditional clicking of court reporters on their stenographic machines may disappear...


The ease of stenomask reporting, or “voice writing,” as practitioners call it, is leading to an upheaval in the world of transcription. Would-be transcriptionists are shunning traditional machine shorthand-using those narrow keyboards that court reporters peck away on-and are turning instead to easier-to-learn voice writing. The number of transcriptionists enrolled in traditional court-stenography courses has dropped by half since 1992, and training schools across the U.S. are closing as voice writing takes over.

no need to Text, just Jott


If you're over the age of 30 or just have big fingers, you may have decided that "texting" isn't for you, but Jott just might be. Its premise is simple: if you have a cell phone, you can dial a toll-free number, dictate a brief e-mail and have it delivered as text to a single recipient or a group. You can leave a message up to 30 seconds long. Hang up when you're through, and a few minutes later the message shows up in your recipient's e-mail box and cell phone message list. Jott uses a combination of voice-recognition software and human transcribers when the software determines a successful transcription is dicey.