Why focus-groups suck

I've just taken part in a focus group and been rewarded fairly handsomely for my trouble. It was fun, I got heard, and I know exactly how the report is going to turn out. 

It was about the local railway station, propensity to use it and what needs to be done to improve it to generate more usage.

Well, you see, therein lies the first problem: a hypothesis from the researchers (or more likely, clients) that colours the entire line of questioning and expectation, even subconsciously. I can think of a dozen ways to improve the station, but not a single one of them will make me use it more - I use it based on need and suitability for my trip; and frankly, the presence of a self-service under-arm-sweetening pie machine won't change my need

I also had to remind the researchers that the intent of their question mattered. For example, their original framework was what would stimulate us users to use it more. But towards the end of the session this has drifted into "what improvements should be made and how urgently". Well, for whose benefit? I had to challenge those who said that improved tropical spa waiting facilities would be wonderful for people having to change trains here on their journey through. Sure, but will it change what you do.

And then there was the final round of conclusion making - that wonderful concept, the consensus. Yep that mythical creature that finds its way into so many wayward decisions and ill-informed conclusions.

There is no such thing as a consensus. Repeat after me. Altogether now (geddit?)

What there is, is a group of individual needs and opinions, half of which conflict. Trying to find a consensus is like trying to shove 18 bowls of fruit into a rucksack: it will still come out rucksack shaped, if a little sticky and damp. 

Unsurprisingly, most recommendations came out A1 - high priority, now! (can you have A3? High priority, any time next decade?). Apart from being fed "medium" as the starting point on most issues, in order to achieve consensus the baseline was taken from the first person who spoke, until basically everyone nodded. Because, of course, if you don't all nod, you don't have a consensus

Seriously, this will tell the client nothing useful about true need and what response they will generate by 'responding' to it. They will get a list where everything from a new information poster to a £30 million refurbishment are all urgently needed to turn the place round. 

Go on, stick a semi-automatic hair-cutting shoe-polishing machine on the platform if you like, but I'll still be working in the Atlantis end of outer Timbuktu, so I'll still be driving there, nowhere near your spangly half-empty railway station. 

(The cash in the envelope was all right though.) 



Service Assembly vs. Service Creation

I've been meaning to write this article for a number of weeks now and have finally been inspired by attending the IBM WebSphere user group session yesterday. I decided to follow the Web 2.0 and "Mashup" track, because I feel this is becoming increasingly relevant in telecoms.

It’s interesting to see how technologies such as IBM Mashup Centre, sMash and iWidgets are making it possible to build short-lived (“situational”) applications very quickly; aggregating, consuming and repurposing content using visual drag and drop paradigms.

I recently proved this to myself, just for fun, in a very very simple way, by assembling what you might call an "application" - frygle.com - prompted by the Twitterings of Stephen Fry. (For those of you who haven't heard of twitter.com, that ISN'T an insult.)

The "application" (for want of a better word) is part blog, part feed aggregator and part search engine; specifically though, the search engine, based on Google - not only searches the web but gives additional weighting to the blog content and twittering of Mr. Fry himself.

This whole concept only took an hour to put together, including buying the domain name and going live on the Blogger platform. So the question is, how does some of this apply to the Telco world?

Well, I do think there are currently some issues around the widget concept for the voice world.

Firstly, the concept of multiple widgets works well in the visual paradigm, but does not translate to the synchronous world of the audio streams. However, more fundamental than that, is the fact we are seeing some of the processing work moved to the browser itself. In the next-generation Telco network the media server takes the place of the user's browser. This increased separation into the model view controller way of doing things makes good sense, even though in the early days client-side scripting was considered bad practice because it hampered accessibility, SEO, and was subject to the vagaries of individual browser quirks.

However, in the voice world where the delivery of content in real time is absolutely essential to the integrity of the voice user interface, additional processing load in the media server is still something most architects are going to shy away from. Furthermore, whilst we still have the legacy of traditional telephony, we are still constrained to dimensioning systems with concepts such as "ports". We therefore have to guarantee performance and footprint, and therefore need the behaviour of media servers to be utterly predictable.

However, I do think the concept of widgets in a slightly more abstract sense still applies. And this is where we come to the discussion around "service creation" versus "service assembly". What the Mashup paradigm shows us is that it is possible to assemble services from higher order building blocks, rather than writing individual lines of code. And this is exactly where we need to be in the Telco world.

Current voice service creation toolkits (and Vicorp toolkit is undoubtedly the best example) go some way to achieving this by allowing the creation of reusable components. These can be built into increasingly higher order applets of functionality, such that services can be built from rich building blocks, without the need to work at the detailed UI level. This is definitely a step in the right direction, even though there are some idealogical challenges to overcome.

For example, my vision would be to see an explosion of these building blocks, in the same way that we see an explosion of widgets, so that service assembly can be achieved using readily available published components. However, this is a pill that Telcos are probably going to find hard to swallow, not least because the average Telco has a completely different mindset to the average Web architect. As I like to point out in my presentations, this mindset is one that says stability is achieved by not changing anything; i.e. if it ain't broke don't fix it.

This is completely at odds with the web and SOA world, where applications are born, live, breathe and die on an almost daily basis. It's going to be quite a challenge for anyone involved in telephony to start to embrace this concept. (Which introduces a whole new discussion around just how embedde-in-the-network applications should be). However, I am without doubt that we are going to have to, particularly as we see the move to SIP based services under direct control of the application server and not the media server.

But this is not the only challenge. The current generation of service creation toolkits do not go far enough in providing point-and-click Mashup capabilities: i.e. they do not provide the capability to consume and re-purpose content without resorting to writing the code by hand.

What we have are two separate worlds of the web and voice, with the voice world borrowing concepts from the web world. Although we have long talked about "voice as an application" as a means to bring these two worlds into the same domain, we are still very much at an early stage of achieving it. So it is with much interest that I am watching initiatives such as Ribbit, recently bought by BT, to see how it will downstream and if it really will bring voice service assembly to the masses.

When I originally started at BT, working on their Network-Embedded voicemail service (“CallMinder”) it took three years to bring the project to fruition – which included developing the application and the high-availability platform to run it on. (Bear in mind, these were the days when a 1Gb disk in your server was bleeding edge).

The dream now is to do it in three weeks. Could we?

Speech and text recognition programs ready for the office

Technology that once seemed best suited for unintentional comedy is now ready for practical application. Software for text and speech recognition is now sufficiently mature to be considered for general office use, say the editors of the Hanover- based iX magazine. Text recognition or optical character recognition (OCR), requires just a standard multi-functional scanner with 300 dpi performance or better to produce decent results. None of the programs tested by the magazine showed any glaring weaknesses.

Speech recognition is a far more complex process, though, and requires more from the user. An extensive training text typically must first be read into the computer to achieve decent accuracy results. And in some cases specialised dictionaries must be purchased as well to get the software up to speed.
Yet once speech recognition software has been trained, there's no longer any need for exaggerated pronunciation and slow speech. The user can simply speak normally. Only minor differences were found in results between the different speech and text recognition programs. [click heading for more]

Do VoiceXML and VoicePHP actually compete?

This is my personal opinion - a comment on the following blog posting: 

With VoicePHP, you can write applications for your business or your mobile phone. Knowledge of PHP is sufficient since it’s the same old PHP which now enables you to create voice applications. There was the earlier XML version called VoiceXML, but due to limitations in XML mainly in designing selective and iterative programming structures, it has not been successful. 
My Response:
VoiceXML, like HTML isn’t, and wasn’t ever designed as a programming language. It is a presentation language. VoiceXML is highly successful and massively deployed, however, the business logic of generating VoiceXML dialogs is done in the programmer’s favourite application environment (Enterprise Java is very popular for example).

So, I think you are missing the point to feel that VoiceXML needs “beefing up” somehow - separation of concerns: presentation and business logic, has been one of the major forces that has driven the incredible success and adoption of the web, alongside the open standards that have allowed a vibrant market place for technology, infrastructure and tools vendors to compete - and thus for organisations and enterprises to benefit. This is where the real value is, and VoiceXML emulates this paradigm - and its growth in adoption is testament to this.

VoicePHP probably has its place - and personally I will enjoy tinkering with it to see what it offers me - but you have to get it in context - I don’t think it even plays in the same space as VoiceXML.
[click heading for more]

It's cloudy; and so the future's bright

[nik's note: Alongside speech techology I have a strong interest in cloud computing - particular its disruptive force in the industry and how it is changing the landscape. I've been a cloud user since we began to think of the concept, having used amazon's infrastructure to build my own webstores and the like. It's something we now must take very seriously in the speech industry, as it lowers barriers to entry and as several players are showing, networked speech engines allow modest devices to perform like high-end desktops. Here's a little something I picked up from the powered-by-cloud blog]

Five reasons the cloud is for real.

The technology behind cloud computing is not brand new. If that is so, why the hype about cloud computing?

Look at the five reasons Alistair Croll of Gigaom cites.

Power and cooling are expensive. It costs far more to run computers than it does to buy them.To save on power, we’re building data centers near dams; for cooling, we’re considering using decommissioned ships. This is about economics and engineering.
Demand is global. Storage itself may be cheap, but data processing at scale is hard to do. With millions of consumers using a service, putting data next to computing is the only way to satisfy them.
Computing is ubiquitous. Keeping applications and content on a desktop isn’t just old-fashioned — it’s inconvenient.
Applications are built from massive, smart parts. Clouds give developers building blocks they couldn’t build themselves, from storage to authentication to friend feeds to CRM interfaces, letting coders stand on the shoulders of giants.
Clouds let us experiment. By removing the cost of staging an environment, a cloud lets a company try new things faster. Billing on demand the cloud means anyone can experiment.

[click heading for more]

Time to reappraise speech recognition systems?

One April, 11 and a half years ago, Hollywood actor Richard Dreyfuss presented a new type of software that was going to 'revolutionise business'. He had been paid to host the launch of Dragon's NaturallySpeaking application, which could faultlessly translate spoken words into text. If this worked, we could chuck away our keyboards. Productivity would multiply. Dragon would become the new Microsoft and a new era of IT would dawn.

And work it did too -- in the demonstration. But not everything about the event was quite so well stage-managed. New York was suffering its worst ever blizzard and few made it through the snow. One year later, founders Janet and Jim Baker hadn't found the mass market they may have anticipated. That year, a Belgian firm called Lernout & Hauspie introduced Voice-Express, another desktop speech software product that could potentially free us all from the tyranny of crouching over a keyboard, ruining our posture and giving ourselves RSI. In a demo, it even outperformed the world's fastest typist.

So why aren't we using this software on every computer in the land? Why aren't we talking to computers, telling them what we want to do? How come Windows and Mac OS remained the user interfaces of choice, when voice commands would be so much more efficient and user friendly? Especially as speech dictation has become part of so many phone calls to buy tickets, report meter readings and query bills? 
[click heading for more]

We can give you the wrong answer much faster

PROBABLY the CTO of every large technology company has to be a futurist. But it's a rare CTO who speaks at the Singularity Summit to consider the prospects for an artificial general intelligence surpassing humans. But Intel's CTO, Justin Rattner, laid out the future of Moore's Law to a packed auditorium for whom computational speed is a near-religious experience.

Yet Rattner says afterwards that raw speed won't be enough. "I once asked our speech recognition team if there was any direct relationship between machine computing speed and recognition accuracy and after a long pause, they said – because they knew I was not going to be happy with the answer – no." He asked why: "Our recognition performance is limited by our algorithmic understanding, not by our instruction speed. We can give you the wrong answer much faster, but we can't give you the right answer much faster."
Speech recognition is, of course, just one of many tasks even a very young human can do routinely and simultaneously.
"It's clearly a case where, until we have the right algorithms, no amount of performance improvement is going to give us the recognition performance a young child can deliver. That's why I try to separate out these notions a little. I have little doubt that when we figure it out we'll require lots of computing power, so there's no sense in abandoning it." 
[click heading for more]