Robin Kermode looks at whether Voice Recognition will change the way we communicate?
So, you’re standing at the supermarket self checkout and the machine starts talking to you. “Place the item in the bagging area.” Tersely you reply, “It’s in the bagging area!” But the machine fails to pick up either the irony or the irritation in your tone. This continues to happen with every item you place in the bagging area. The most irritating thing about self checkouts is that you almost always have to wait for a person to come over and sort out why the scales can’t weigh a potato or prove that you’re over 18 to buy alcohol despite being grey haired and wearing sensible shoes. It would have been better to wait for the human checkout till in the first place.
Amazon’s Alexa, Apple’s Siri and Sat Navs are all vying to interact with us in a friendly way. Giving an inanimate object a human name – usually a soft one – doesn’t make me feel any more loved. It’s about as sensible as talking about emotions to a replicant in Blade Runner.
And surely it’s actually quicker to turn the light switch on with your finger anyway than to have to say, “Hi Jennie, please turn the low light circuit on. No the LOW LIGHT CIRCUIT. Oh, for goodness sake, I’ll do it myself!” I almost expect the voice to say, “You sound irritated today, Robin. Might I suggest a local yoga class only two minutes away? I can even arrange a discount.” “No, Alexa you can’t!”
Voice recognition has come a long way, of course. I remember only a few years ago trying to order cinema tickets from Odeon ticket line. The pre-recorded announcement said, “Please say the name of the cinema you want to book tickets for…” After the tone, I say quite clearly “Kensington.” After a couple of seconds, I hear, “Thank you for calling the Odeon Streatham.” Agghhhh! I was an actor doing voice overs for a living and it still couldn’t get it right.
What fascinates me is how infuriating it is speaking to a machine that fails to recognise what you’re saying. Like a government department that sends you a wrongly issued parking ticket. You are expected to respond immediately or face a fine. The parking department or tax office can seemingly take years to respond – if at all – so it just feels like the boot is on the other foot. That’s what speaking to a machine can be like. The machine always thinks it’s right. Just as speaking to a person who always thinks they’re right, it is really irritating.
There have been some things that have made life easier. Emojis allow us to send ironic text messages that previously could be interpreted as simply rude. The emoji winking face explains the intended tone of the delivery. But machines still don’t seem able to pick up our tone of delivery – especially irony. In the binary world of computers things are either meant or not meant. Funny or not funny.
Perhaps one day, Alexa will deliberately turn on the wrong appliance, like a playful child, and say “Only joking, Robin! Having processed your vocal tone, speed of delivery and choice of words and compared them against our latest algorithms, I realised from your tone this morning that you needed cheering up! So here’s my latest joke, “Why do machines put on weight? Because they always take a ‘megabite’! Ha Ha!”
Hands free voice recognition is useful, of course, and plays an ever more active part in our lives. But some things just don’t need voice. The TV remote has worked well for years. It stops me having to get up out of my chair to switch channels. But would I love it more it if spoke to me and I gave it a name?
We love the human voice. We love stories. We love hearing the shared experiences of other people being told to us by a fellow human. The early bedtime stories as children made us feel safe. Would a computer-generated voice from a virtual babysitter read them as well? I’m sure it would be possible to have a machine impersonating the tone of Richard Burton or Judi Dench but without the essence and nuance of humanity, would it have the same effect? Would it be the same as listening to Burton reading the original Under Milk Wood?
Would having the real (albeit pre-recorded) Dame Judi on your Sat Nav make your journey less stressful? Possibly. But what if that voice was a synthesised ‘generated’ Judi? In another ten years will we be able to tell the difference?
Of course we will. Will I learn to love Alexa and Siri? Possibly – but only if she stops telling me I’m wrong all the time and shows that she sees my point of view at least 50% of the time.