We recently caught up with Dr. David Nahamoo, IBM’s speech technology guru to hear about what he calls “super-human speech recognition.” No, he’s not talking about Spidey or Superman, but rather a project meant to substantially improve the quality, dialog management, and usability of speech technology by the end of the decade — for dictation, call centers, cars, and a broad set of other applications with embedded computing power. One of his goals is to surpass a human for real-time dictation such as a lecture, phone conversation, or broadcast — and he would like to do that for 50 languages with the same computer.
Before fully speech-enabled applications become ubiquitous, Nahamoo says that the technology must cross a simplicity threshold that would open it up to more developers. The speech recognition community converged around Voice XML about 5 years ago, thereby abandoning proprietary interfaces. Nahamoo feels that the next step is for providers like IBM need to encapsulate design principles and behaviors in templates. Sound familiar? Like client/server and web development tools (think Visual Basic and Dreamweaver, respectively), we’d say that speech needs its own GUI-based development environment.
www.getdesign.in - My periodic blog exploring the world of business, experience design and interaction, with a smattering of gadgetry and social media. A world where business, people and technology meet.
Let's Fix Things: For over two decades I've been consulting in Communications Design: Everything from business strategy and processes, through to technology, interaction and customer experience. The thoughts here are my own, not necessarily that of my employer.
I have a penchant for spotting patterns and fixing broken user and customer experiences. Even my Bumblebee project hasn't escaped - I've been using Six Sigma techniques to study and predict their behaviour patterns. ☺