T3i Group Predicts Healthy Growth In IVR Market Driven By the Synergy of New Applications and Technology

According to T3i Group's latest research, the global interactive voice response (IVR) market, which includes speech recognition, will grow to $514 million by 2013, up from an estimated $431 million this year, due in part to the growth in voice XML (VXML) technology.

The new "InfoTrack for Converged Applications 2008 IVR Market Report" found global IVR shipments from the top 11 vendors exceeded 625,000 ports in 2008. The top three vendors based on ports shipped were Nortel, Genesys and Convergys; and the revenue leaders were Avaya, Nortel and Genesys. T3i Group said North America led all regions but with considerably less than 50% of the market, followed by the Europe Middle East Africa (EMEA) and Asia Pacific (APAC) regions, respectively.

T3i Group segmented the analysis in this report by technology, applications and vertical industry.

Among the key findings:

  • 95% of IVR ports shipped in 2013 will support VXML, compared with less than 75% today. VXML enables Web sites to offer the same text-based applications, such as order entry, with speech recognition.
  • The top three IVR applications are incoming call handling for contact centers; inbound self-service transactions; and outbound calling, such as appointment confirmations, collections reminders and repair notifications.
  • As vendors and enterprises integrate IVR into more comprehensive customer-care solutions, IVR ports shipped specifically for inbound calls to contact centers will decrease nearly 10% each year to 2013.
  • In comparison, IVR port growth will be driven by outbound applications at a rate of almost 12% annually through 2013.
  • DTMF (analog voice) port shipments are declining, while shipments of speech ports, which recognize speech or convert text to speech, will hold an almost 2:1 advantage by 2013.
  • IP/SIP port shipments are growing strongly year over year; by 2013, only 10% of all IVR ports shipped will be TDM, compared with 42% today.

 

Service Assembly vs. Service Creation

I've been meaning to write this article for a number of weeks now and have finally been inspired by attending the IBM WebSphere user group session yesterday. I decided to follow the Web 2.0 and "Mashup" track, because I feel this is becoming increasingly relevant in telecoms.

It’s interesting to see how technologies such as IBM Mashup Centre, sMash and iWidgets are making it possible to build short-lived (“situational”) applications very quickly; aggregating, consuming and repurposing content using visual drag and drop paradigms.

I recently proved this to myself, just for fun, in a very very simple way, by assembling what you might call an "application" - frygle.com - prompted by the Twitterings of Stephen Fry. (For those of you who haven't heard of twitter.com, that ISN'T an insult.)

The "application" (for want of a better word) is part blog, part feed aggregator and part search engine; specifically though, the search engine, based on Google - not only searches the web but gives additional weighting to the blog content and twittering of Mr. Fry himself.

This whole concept only took an hour to put together, including buying the domain name and going live on the Blogger platform. So the question is, how does some of this apply to the Telco world?

Well, I do think there are currently some issues around the widget concept for the voice world.

Firstly, the concept of multiple widgets works well in the visual paradigm, but does not translate to the synchronous world of the audio streams. However, more fundamental than that, is the fact we are seeing some of the processing work moved to the browser itself. In the next-generation Telco network the media server takes the place of the user's browser. This increased separation into the model view controller way of doing things makes good sense, even though in the early days client-side scripting was considered bad practice because it hampered accessibility, SEO, and was subject to the vagaries of individual browser quirks.

However, in the voice world where the delivery of content in real time is absolutely essential to the integrity of the voice user interface, additional processing load in the media server is still something most architects are going to shy away from. Furthermore, whilst we still have the legacy of traditional telephony, we are still constrained to dimensioning systems with concepts such as "ports". We therefore have to guarantee performance and footprint, and therefore need the behaviour of media servers to be utterly predictable.

However, I do think the concept of widgets in a slightly more abstract sense still applies. And this is where we come to the discussion around "service creation" versus "service assembly". What the Mashup paradigm shows us is that it is possible to assemble services from higher order building blocks, rather than writing individual lines of code. And this is exactly where we need to be in the Telco world.

Current voice service creation toolkits (and Vicorp toolkit is undoubtedly the best example) go some way to achieving this by allowing the creation of reusable components. These can be built into increasingly higher order applets of functionality, such that services can be built from rich building blocks, without the need to work at the detailed UI level. This is definitely a step in the right direction, even though there are some idealogical challenges to overcome.

For example, my vision would be to see an explosion of these building blocks, in the same way that we see an explosion of widgets, so that service assembly can be achieved using readily available published components. However, this is a pill that Telcos are probably going to find hard to swallow, not least because the average Telco has a completely different mindset to the average Web architect. As I like to point out in my presentations, this mindset is one that says stability is achieved by not changing anything; i.e. if it ain't broke don't fix it.

This is completely at odds with the web and SOA world, where applications are born, live, breathe and die on an almost daily basis. It's going to be quite a challenge for anyone involved in telephony to start to embrace this concept. (Which introduces a whole new discussion around just how embedde-in-the-network applications should be). However, I am without doubt that we are going to have to, particularly as we see the move to SIP based services under direct control of the application server and not the media server.

But this is not the only challenge. The current generation of service creation toolkits do not go far enough in providing point-and-click Mashup capabilities: i.e. they do not provide the capability to consume and re-purpose content without resorting to writing the code by hand.

What we have are two separate worlds of the web and voice, with the voice world borrowing concepts from the web world. Although we have long talked about "voice as an application" as a means to bring these two worlds into the same domain, we are still very much at an early stage of achieving it. So it is with much interest that I am watching initiatives such as Ribbit, recently bought by BT, to see how it will downstream and if it really will bring voice service assembly to the masses.

When I originally started at BT, working on their Network-Embedded voicemail service (“CallMinder”) it took three years to bring the project to fruition – which included developing the application and the high-availability platform to run it on. (Bear in mind, these were the days when a 1Gb disk in your server was bleeding edge).

The dream now is to do it in three weeks. Could we?

Do VoiceXML and VoicePHP actually compete?

This is my personal opinion - a comment on the following blog posting: 

With VoicePHP, you can write applications for your business or your mobile phone. Knowledge of PHP is sufficient since it’s the same old PHP which now enables you to create voice applications. There was the earlier XML version called VoiceXML, but due to limitations in XML mainly in designing selective and iterative programming structures, it has not been successful. 
My Response:
VoiceXML, like HTML isn’t, and wasn’t ever designed as a programming language. It is a presentation language. VoiceXML is highly successful and massively deployed, however, the business logic of generating VoiceXML dialogs is done in the programmer’s favourite application environment (Enterprise Java is very popular for example).

So, I think you are missing the point to feel that VoiceXML needs “beefing up” somehow - separation of concerns: presentation and business logic, has been one of the major forces that has driven the incredible success and adoption of the web, alongside the open standards that have allowed a vibrant market place for technology, infrastructure and tools vendors to compete - and thus for organisations and enterprises to benefit. This is where the real value is, and VoiceXML emulates this paradigm - and its growth in adoption is testament to this.

VoicePHP probably has its place - and personally I will enjoy tinkering with it to see what it offers me - but you have to get it in context - I don’t think it even plays in the same space as VoiceXML.
[click heading for more]

Why VoiceXML 3 is not just VoiceXML 2.2

Why VoiceXML 3?


As with many programming languages, future versions are expected simultaneously to provide new features and to be simpler to use.

VoiceXML 3
- is precisely designed
- is more extensible
- contains new features

VoiceXML 3 began with the functionality of VoiceXML 2. This functionality was split up into logical modules of related functionality. Each module is now being defined in detail, in two pieces: syntax and semantics. The syntax of the module is similar to the syntax for corresponding capabilities in VoiceXML 2, with the functionality and event behavior of the syntax defined in the semantics portion.
These modular pieces are collected into profiles that essentially are complete languages.

So VoiceXML 3 now consists of:
- a framework for developing profiles from modules
- an XML-based eventing system
- an eventing system for the semantic descriptions associated with the syntax of each module
- several modules, including new audio control capabilities
- two module definitions, one emulating VoiceXML 2.1 and one combining the range of functionality available in VoiceXML 3.0 
[click heading for more]

TringMe Creates Flash VoiceXML Platform


TringMe has announced a Flash-enabled VoiceXML platform. TringMe said, "A lot of infrastructure is required to build voice application, because of the complexity involved in building interactive voice applications and the need for optimum performance and carrier grade reliability. Even with the innovation on both voice and RIA (Rich Internet Applications) fronts, something is required to bridge the gap and make voice accessible from RIA in a simple manner effectively." 

They added, "With TringMe, we have tried to bridge this gap. With extension of our platform, TringMe opens up VoiceXML accessibility to millions of flash and web developers to easily, yet, tightly integrate voice and telephony without having to know the intricate details of call-signalling, routing, billing etc."

Developers only require Flash and Web technologies to create rich voice and telephony applications. Applications that can be developed using TringMe's Flash VoiceXML platform include speech recognition, DTMF or text-to-speech. [click heading for more]

W3C examines the next generation of speech technology

[nik's note:]

The W3C on Tuesday said the next generation of VoiceXML will include specifications for speaker verification.

"Speaker verification and identification is not only the best biometric for securing telephone transactions and communications, it can work seamlessly with speech recognition and speech synthesis in VoiceXML deployments," Ken Rehor, newly elected chairman of the VoiceXML Forum, said in a statement.
The W3C has now completed its desired requirements for VoiceXML 3.0 and expects to have a working draft of the specifications by the end of the first quarter, said James Larson, co-chair of the W3C Voice Browser Working Group.
In addition to the speaker identification requirements for VoiceXML 3.0, the W3C addressed the issue of extending its Speech Synthesis Markup Language (SSML) functionality to certain languages including Mandarin, Japanese and Korean.

[click heading for more]

Resolvity Receives 2008 Speech Technology Excellence Award

Resolvity, Inc. announced today that its Speech Application Platform received the Speech Technology Excellence Award for 2008 presented by TMC's Customer Interaction Solutions magazine.

The Platform consists of a sophisticated Artificial Intelligence runtime, a state-of-the-art Dialog Server, in-depth management reporting, and a framework for seamless integration with call center systems. The Platform is fully standards compliant and is certified to interoperate with most popular Voice XML platforms and speech recognition engines. The Platform has been designed with a focus towards developing solutions that are complex from a customer interaction standpoint and that require frequent "near real-time" modifications to respond to constantly changing business requirements. [click heading for more]

VoiceXML Browser Brings Speech Recognition to Millions of Asterisk Users LumenVox Speech Recognition Software Integrated in I6NET's VXI* 3.1

I6NET announced today the release of VXI* 3.1. VXI* enables thousands of existing VXML applications to run on the Asterisk PBX platform, and new speech recognition solutions to be built affordably.
"The release of VXI*, powered by the LumenVox Speech Engine, allows the large pool of VXML developers to run existing speech applications on the Asterisk platform or to develop new ones that are standards-based," commented Bill Meisel, president of TMA Associates and Publisher & Editor of Speech Strategy News. "The widespread use of the Asterisk platform presents an opportunity for VoiceXML developers to increase adoption of speech solutions in cost-sensitive markets. And who isn't sensitive to cost?" [click heading for more]

Aterisk: VXI* 3.1 final released!

The final VXI* VoiceXML browser 3.1 “late summer” ref. 2008-09-01 32bit is now released! In few days the same package for 64bit will be available too. These new release is suitable for production platforms running with all lastest Asterisk 1.4 kernels.

New features added:
add: additional properties for the TextToVideo
add: Specific video URL in the accounts
add: Video detection
add: Counters (PEAK, DENIED, SPEECHS)
add: Set VXML_ERROR if the session cannot be open (content the cause)
add: End date to the session dump
add: Use Number (calledif)) to identify the account
add: Option mute to openvxi
add: CLI admin commands
add: alias mimitype video/3gp
add: VXML() asterisk function to get/set parameters
add: Porting for Asterisk 1.2
add: Priority configuration
add: Sessions dump
add: CDRupdate parameter
add: Asterisk vxml application dates
add: Add the dial: transfer prefix
add: CDR updates at the end of the VoiceXML session
add: .alaw and .ulaw formats for the TTS
add: ASR automatic allocation
add: speech configuration for the accounts
add: Object property to get internal properties values
add: Configuration of DTMF controls
Modifications:
mod: Disable SIGPIPE generation
mod: Open sessions locks
mod: support Jsession (java sessions)
mod: Start/stop script (without safe_openvxi)
mod: Disable log Stdout by default
mod: Remove direct chan access
mod: vxml show application
mod: Correction in the offset object
mod: Small correction for CLI commands
mod: Correction to use the MP3Player application
mod: Correction to support exec: in the transfer [click heading for more]

Verizon Business Launches Open Hosted Speech Services

Verizon has launched its new open hosted speech services, which allows customers to create and host their own speech applications while they continue to rely on Verizon Business' speech platform, which includes interactive voice recognition services.
According to the company, this new open hosted speech services (OHSS) gives users flexibility and oversight typically associated with customer premises equipment-based solutions, without the large capital costs or burden of platform management. [click heading for more]