Google’s AI podcast tool transforms your text into stunningly lifelike audio – for free
- by Anoop Singh
- 3
I am not at all religious, but when I discovered this tool, I wanted to scream, “This is the devil’s work!”
When I played the audio included below for you to my editor, she slacked back, “WHAT KIND OF SORCERY IS THIS?” I’ve worked with her for 10 years, during which time we have slacked back and forth just about every day, and that’s the first all-caps I’ve ever seen from her.
Also: How ChatGPT scanned 170k lines of code in seconds and saved me hours of work
Later, she shared with me, “This is 100% the most terrifying thing I’ve seen so far in the generative AI race.”
If you are at all interested in artificial intelligence, what I’ve found could shake you up as much as it did us. We may be at a watershed moment.
In this article, I’ll demonstrate a service offered by Google. Please take a few minutes to listen to at least a bit of the two audio clips I’m going to share. I’ll show you how they were created and how to make your own. Then we’ll dive into the earthquake-level implications.
Finally, please join me in the comments below to talk about this. I think we’ll all need to do some processing about what this means.
The demonstration
What you’re about to hear is a podcast discussion about one of my recent articles.
All I did was paste the text of my article about the too-real VR conversion of 2D images to 3D into Google’s NotebookLM service and click Generate.
Let me be perfectly clear: the “people” in the broadcast are not real. The audio is entirely AI-generated.
To fully appreciate the implications of this technology, it’s worth spending a few minutes reading my original article and then listening to at least one minute of the six-minute audio track.
Go ahead, I’ll wait.
Here are a few things to notice:
- The quality of the two people speaking in terms of both their voice fidelity and naturalness
- The use of appropriate colloquialisms like “water works” for describing tears and crying
- The completely organic nature of their banter and the fact that there even was banter
- How well the “human” speakers get the concepts in the article, including the emotional aspects of reliving old memories
- Overall, how real this sounds, from intro to body to outro, it’s indistinguishable from a real broadcast
Next, let’s take a moment to look at how this was generated.
What is NotebookLM?
NotebookLM is kind of a cross between Google Keep and the AI in Notion.
Also: How to use Google’s AI-powered NotebookLM to organize your research
The main data structure in NotebookLM is the notebook, which contains all your “notes” about a given project. Notes, called “sources” in NotebookLM, can be text you type into NotebookLM, similar to Keep. But they can also be PDFs, Google Docs or Slides, pasted text, audio files, YouTube links and web URLs.
NotebookLM seems somewhat fussy about the format of the sources, because when I pasted the URL of my article, it couldn’t read it. I had to copy the text and paste it in. I also found a PDF it couldn’t read even though the PDF didn’t appear locked or restricted.
Once you have all your sources in a notebook, you can ask NotebookLM’s AI to do AI things with the data. You can get a summary. You can ask it to extract main points. You can ask it for an outline, and so on. The AI actions use just the source data provided in a given notebook, similar to how Notion’s AI works only on the data uploaded into your own Notion account.
Also: In a surprise twist, Meta is suddenly crushing Apple in the innovation battle
The big surprise feature, the one I’m agog about here in this article, is the Generate button, which generates the realistic banter between the two podcast hosts you heard in the demo.
Right now, NotebookLM is beta and free.
Creating your own audio (and a second demo)
Let’s create another astonishing podcast discussion. This time, we’ll use Jason Perlow’s fascinating article on the fall of Intel as our source.
Also: Google’s NotebookLM can discuss your notes with you now
First, point your browser to NotebookLM. You’ll need to be logged into your Google account. Once you’re logged in, you’ll see a list of notebooks. This screenshot shows just my first test, the demo I showed above, plus some sample notebooks Google provides.
Clicking on New Notebook takes us to the Add Sources screen.
Because I previously found it didn’t process links to ZDNET articles properly, I just went down to the lower right corner and clicked on Paste Text. Then, having already cut the text from Jason’s article, I pasted it into the data entry field.
After a few seconds, NotebookLM opens what it calls the Notebook Guide, a summary of sources and suggestions.
On the right is the Audio Overview section. Just click Generate. This takes a few minutes to generate a new podcast. Here’s what we got back this time.
If you want to export the file, you can click the three-dot menu and select download. The site downloads a WAV file, although you’ll need to add the .WAV extension. And that’s it.
One quick note: about four minutes in, there’s one small error. The male voice repeats a sentence. I’ve made the same error in webcasts and broadcasts myself, but still.
The staggering implications
First, let’s take a moment to appreciate just how incredible the results are. These two recordings demonstrate a depth of understanding, the ability to write a chatty dialog that’s relevant, and the ability to add new information that’s culturally relevant and even sensitive. And that’s all before we get to the quality of the voices and even the vocal tones.
Personally, I first felt this as a gut punch. As a book author, the ability to “give good radio” is essential when doing book promotions and book tours. I’ve been honing my skills for more than 15 years, sweating it out with each appearance, and I’m still not as good as these two fake broadcasters.
Also: Google’s NotebookLM can now transform YouTube videos into study guides
Yes, they were using my article (and later, Jason’s) as fodder for their discussion. But output of this quality verges on making creators and content producers like me begin to feel the heat. NotebookLM had no options other than to speed up the speaking speed. Now imagine if you could choose the speakers, the styles, and maybe edit a little of the AI-generated script.
Then, there’s the whole question of what is real. Last week, I showed you how the Vision Pro made a 20-year-old snapshot of my long-gone kitty appear real right in front of my eyes. Now, I’m showing you how a tiny little feature in the corner of a Google notebook experiment can make up two entirely fabricated speakers that are indistinguishable from human.
Also: IBM will train you in AI fundamentals for free, and give you a skill credential – in 10 hours
For years, we’ve had the ability to distort reality in Photoshop and other editing tools. Movie makers have used special effects to create fake reality in story telling. Even the very act of taking a picture on film alters reality a bit.
That picture of my cat was a 1/250th of a second snapshot of her reality, and you could only see what the camera saw, and how the developing process (that was still film) reacted to the light in the film’s emulsion.
So it’s not that we’re suddenly able to fake real. It’s that we’re able to extend the fake further into reality. A snapshot of a cat is different than seeing her, as if she was real, right in front of you. A computer-generated script is far different from hearing two broadcast professionals having a dynamic discussion about a topic of interest.
There’s also the question of cost and speed. To be clear, it cost Google billions of dollars to turn my article into a podcast. But it cost me nothing. It also took moments. That’s a huge reduction in the barrier of entry to content production.
Also: 6 ways to write better ChatGPT prompts – and get the results you want faster
It’s also worrying that some companies are choosing to use AI-generated content rather than hiring professionals like me and Jason to do it. I’ve been working on this article for two days, because I’ve been trying to find just the right way to tell this story.
But when I fed the prompt “write an article about the astonishing ability of Google’s NotebookLM to create an audio podcast and the implications thereof” into ChatGPT, I got a fairly well-considered article back in less than a minute.
My article is clearly deeper and more complete, drawing off the nuances of my personal style, as well as my experiences and choices. But the ChatGPT-generated version isn’t bad. It wrote detailed thoughts on these five themes:
- Democratization of content creation
- Transformation of education and knowledge sharing
- Impact on the creative industry
- New ethical questions
- Changing the economics of podcasting
That’s impressive for a minute’s work.
Google’s NotebookLM got me thinking about the kinds of services this might foreshadow. I do a lot of YouTube videos, and, to be honest, I’m running behind. Could I someday have something like this Generate feature create the talking head section of a YouTube video, making it seem as if I’m giving the performance?
On one hand, that might save me a ton of time and give me a chance to catch up on my backlog. But on the other hand, holy scary Batman! Do I want a simulacrum of me running around, saying gosh knows what, espousing beliefs I might disagree with or even find abhorrent? Or what if the AI itself hallucinates, ignores, or misinterprets its guardrails and spews something deeply inappropriate? It’s not like it’s never happened before.
How many friends, constituents, and clients might see such a thing and not be able to tell it was a deepfake? How much of a mess would that be to clean up? Would it cost me a gig or a friendship, or hurt the feelings of someone I care for?
I have always loved new technology. I have been fascinated by AI since I wrote one of the very earliest academic papers on the societal implications of AI, back in the days of wooden ships and iron programmers.
Also: How Apple, Google, and Microsoft can save us from AI deepfakes
But I’m starting to have a better understand of how the Luddites, those 19th-century textile workers who opposed the use of automation machinery, must have felt.
As impressed as I am by generative AI, and as beneficial as I personally have found it, capabilities this advanced, which are merely harbingers of a vastly more advanced near future, well, they terrify me.
Of course, there’s the spam side of the equation. More and more, the algorithm is presenting me with narrow-focused YouTube videos on topics that interest me, only to find out after watching them that they’re clearly AI-generated. Not only does the flood of these videos create unfair competition to real human creators, but they waste viewers’ time. Worse, they’re pushing out the real experts who might otherwise produce videos on those topics.
The power of the human BS detector
But here’s the thing. When those AI-generated videos first came out, it could sometimes be unclear whether they were real or not. But after a year or so, it’s now instantly obvious what’s AI garbage and what’s lovingly crafted by a human.
You can even tell by listening to the two sample podcasts I’ve provided. The first one rocked me to the core. And the second one is very, very good. But listen to one after the other and it’s abundantly clear there’s a pattern. We humans who have lived most or all of our lives in an intense media environment have finely tuned BS detectors. Give us a few years of this stuff, and we’ll be able to see through even the best of generated AI.
Also: I tested 7 AI content detectors – they’re getting dramatically better at identifying plagiarism
The big question is whether the folks who pay creators will care. I think they will. There’s no question that Jason Perlow, for example, writes technology articles with his own deep perspective. Much of what he writes about are fields we both know a lot about.
But I make sure to read his stuff, because I always learn from his unique perspective. I don’t think that can be cloned by an AI, and that’s why he has such a strong following of real people who value his unique voice and look forward to each new piece he produces.
So, while some publishers and media aggregators will always go for the cheap solutions, they’ll all start to blend together, especially as AI algorithms begin to entrain based on a common, if enormous, block of training data. But ZDNET, with uniquely experienced writers like Jason and me, and our fearless editors, will always value the uniqueness, the human-ness, and the depth of perspective that only we bring — and that, by extension, gives ZDNET its own unique identity among other top tech sites.
That’s not something AI can do, and probably never will be able to.
What do you think? Are you as concerned as I am? Did you find these demos impressive? Have you tried out NotebookLM yourself? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.
filo/Getty Images I am not at all religious, but when I discovered this tool, I wanted to scream, “This is the devil’s work!” When I played the audio included below for you to my editor, she slacked back, “WHAT KIND OF SORCERY IS THIS?” I’ve worked with her for 10 years, during which time we…
filo/Getty Images I am not at all religious, but when I discovered this tool, I wanted to scream, “This is the devil’s work!” When I played the audio included below for you to my editor, she slacked back, “WHAT KIND OF SORCERY IS THIS?” I’ve worked with her for 10 years, during which time we…