If you’ve been following the latest AI news then you’ll know that chatbots that you can talk to using your voice are here. OpenAI was one of the first to demo the technology with its ChatGPT Advanced Voice mode (currently only free for 10 minutes a month), but Google got to market first with Gemini Live (now free to all Android users), and recently Microsoft joined in by revamping its Copilot website and app (which is free to everyone) to include voice conversations.
The ability to talk to AI using our voice, and have it talk back like a human, has been the sci-fi dream ever since Captain James T. Kirk addressed the ship’s computer in Star Trek, but it was later sci-fi creations that proved indistinguishable from human beings, like HAL 9000 and the Blade Runner replicants, that ignited our imaginations about the possibilities of an AI that could interact like a human.
Now we appear to be living in the future, because you can, right now, have a conversation with AI using the smartphone or computer you’re reading this on. But while we’ve made huge progress towards a human-like companion, there’s still a long way to go, as I discovered recently by putting the latest voice-controlled AIs – ChatGPT Advanced Voice mode, Gemini Live, and Copilot – through their paces for a couple of weeks. Here are my top three takeaways:
1. Interruptions are a great idea, but don’t work properly
The biggest problem I find with talking AIs is being able to interrupt them successfully, or their ability to interrupt you when you don’t want them to. It’s great that ChatGPT, Gemini Live, and Copilot all let you interrupt, mainly because they tend to give long and ponderous answers to everything you ask them, and without that ability, you wouldn’t bother using them. That process, however, is often flawed; either they miss your interruption or they then respond to your interruption with more talking. Usually, it’s some version of, “Ok, what would you like to know about instead?”, when all you want them to do is stop talking so you can begin to talk. The result is usually a messy series of jumps and starts that kills the natural flow of the conversation and stops it from feeling human.
Quite often this week I found myself yelling, “Just stop talking!”, at my phone, just so I could get a word in, which isn’t a good look. Especially since I sit in an office surrounded by people for most of the day.
Another problem I frequently encountered with all of the chatbots is thinking I had finished talking when in fact I was just pausing to consider my thoughts and was still halfway through a sentence. The whole AI experience needs to be as smooth as butter for you to have confidence in it, or the spell breaks.
2. There’s not enough local information
Ask any of the current crop of chatbots where the best place to get a pizza is locally and apart from Gemini Live, you get told that they can’t search the web. Gemini Live is massively ahead here – it will make a recommendation for somewhere good to get pizza. The recommendations aren’t bad, and although it can’t make a reservation for you it will get you the phone number of the restaurant.
Voice-activated chatbots obviously need to be able to browse the web, just like text-based chatbots currently can, but right now ChatGPT Advanced Voice mode and Copilot can’t, and that’s a huge drawback when it comes to delivering relevant information.
3. They’re not personal enough
For voice AI to be useful it needs to know a lot of information about you. It also needs to be able to access your important apps like your inbox and your calendar. At the moment it can’t do that. If you ask it, “Hey, am I free at 4 pm this Friday?”, or, “When is the next family birthday coming up?”, you get told that it can’t do that right now, and without that kind of ability, the usefulness of voice AI just falls off a cliff.
So, what is a talking AI good for?
Right now the best use of Voice AI is for asking questions, giving you some motivation to do something, or coming up with ideas that you wouldn’t think of on your own. Pick a subject and get AI to engage with you in a conversation and you’ll find that it knows a surprising amount about a lot of things. It’s fascinating! For example, one of the things I actually know a lot about is Brazilian Jiu-Jitsu, and I found I could engage each of the chatbots in a pretty good conversation about it, even down to a surprising level of detail regarding techniques and positions. Based on my experience I’d say that Copilot gave me the best answers and that Gemini seemed more likely to hallucinate things that weren’t true.
In terms of the interface, I think ChatGPT is leading the way. I really like the way its swirling orb seems to react with a pulse that’s in time with whatever you say, which gives you confidence it’s actually listening. Gemini Live in contrast has a mainly dark screen with a glowing area at the bottom, which doesn’t give you a focus point to look at, leading to a slightly more soulless experience.
The AI you can talk to right now is great for delving into research topics, but it also feels a bit half-finished, and it’s going to need a lot more integration with our smartphones before it can perform at the level we’d naturally like it to. Of course, it will get better over time. Right now the elephant in the room is Apple Intelligence and its associated Siri, who are both late to the party. We’re still waiting for an Apple Intelligence release date, and even then we won’t get the full all-singing, all-dancing Siri until next year.
Right now the promise of an AI we can talk to just like a friend or a real virtual assistant seems tantalizingly close, but also still a long way off.