Essays

Project Gutenberg Produced 5,000 AI Audiobooks, but How Do They Sound?

This content contains affiliate links. When you buy through these links, we may earn an affiliate commission.

Arvyn Cerézo

Senior Contributor

Arvyn Cerézo is an arts and culture writer/reporter with bylines in Book Riot, Publishers Weekly, South China Morning Post, PhilSTAR Life, the Asian Review of Books, and other publications. You can find them on arvyncerezo.com and @ArvynCerezo on Twitter.

If you’re worried about AI and how quickly it’s being integrated into the publishing industry, this news is not going to make things any better.

AI has been widely used in every aspect of the industry, from marketing to business development, publicity, and even writing, as evidenced by Publisher’s Weekly’s AI webinar last September. And now, AI is being used in audiobook production as well.

Project Gutenberg, the nonprofit organization responsible for digitizing public domain ebooks and making them free and accessible, collaborated with Microsoft and MIT in September to publish 5,000 AI-produced audiobooks. They were able to do this by using AI-powered neural text-to-speech technology, and the production was heavily automated.

The typical process for producing an audiobook is laborious. As the producer, one must pick the right narrator, have them read the book and conduct research, and have them practice, record, and do retakes. After that, editors will proofread and edit the recordings. Then, sound engineers will mix them to sound good on speakers and to listeners’ ears. This is a lengthy process that takes weeks of work for just one audiobook. Imagine working on 5,000.

With the production of these AI audiobooks, they used previously created ebooks as a starting point. To automate, they developed HTML-based processes to easily parse the text and to allow the AI voice to record and compile the audiobooks into neat packages. They also chose the appropriate voices for each audiobook, depending on genre.

The AI cloned — or recreated — its voice from sample recordings in order to narrate the ebooks. Using advanced AI technology, they were able to add emotions to the words spoken by the AI. “Our system uses new advances in neural text-to-speech, emotion recognition, custom voice cloning, and distributed computing to create engaging and lifelike audiobooks,” they wrote in a paper about the steps they took. This process is roughly similar to actor Edward Herrmann’s case, whose voice was recently cloned to create an audiobook.

The number of AI audiobooks produced by Project Gutenberg et al. is huge when you consider that Penguin Random House Audio, one of the largest audiobook production houses in the entire publishing industry, produces only about 2,400 audiobooks per year.

So how do these AI-produced audiobooks compare to human-narrated ones?

How Do Project Gutenberg’s AI-Produced Audiobooks Sound?

I listened to some of the 5,000 audiobooks, which included nonfiction, fiction, and poetry, such as The Black Tulip by Alexandre Dumas, The Philippine Islands by Ramon Reyes Lala, Stories of King Arthur’s Knights, Told to the Children by Mary MacGregor, The Call of the Wild by Jack London, and Up From Slavery by Booker T. Washington among others.

Although I was able to find titles by authors of color, they pale in comparison to the audiobooks by white authors on the list. Publishing has always been white, with gatekeepers still reckoning with the past. This reflects Project Gutenberg’s list, which includes many classics by white authors that have been turned into audiobooks. Given that it only took them about 30 minutes to produce an AI audiobook, it won’t hurt for this project to include these 100 classic books by authors of color in the future. This ensures that, as technology advances, marginalized groups aren’t left behind and feel seen in literature. And that can only happen if developers keep diversity in mind.

Meanwhile, while the recordings indeed do sound human-like, the voices are flat and emotionless. There’s no variation in voices when it comes to dialogue, as there seem to be no female voices available. In addition, the stories lack the ability to truly touch the reader’s emotions. There’s no control over pacing or dramatic narrations, and the same voice is used for all audiobooks, effectively erasing personalization and characterization.

Will AI Replace Human-Recorded Audiobooks?

While the voices do sound human in these AI audiobooks, the art of good narration — accent, pacing, dramatic pronunciation, characterization, and so on — is lacking. Human narrators effectively set the scene, making you fall in love and feel at ease with the story.

Listening to AI audiobooks, on the other hand, doesn’t provide stimulation. When listening to audiobooks, they say that a narrator can make or break an audiobook, and it’s true enough here. Although there are some titles worth checking out from the catalog, they are undermined by the monotonous narration.

In addition to style, almost all of the audiobooks have the same AI narrator. The AI voice reads everything the same way, whether it’s fiction, poetry, or nonfiction, and I frequently mistook them for the same audiobook. It’s too similar. Too flat. It will be some time before AI technology can do what human narrators do, but I believe that it’s gradually improving.

These AI audiobooks aren’t perfect, but I believe that they will benefit those who can’t afford to buy audiobooks, which are extremely expensive. They’re often more than twice the price of a paperback, so some of the titles in Project Gutenberg’s catalog may be of help. There are libraries that offer audiobooks both online and offline, and some retailers offer discounts as well, so if titles are not available there, listeners can opt for these AI audiobooks instead.

For the publisher’s part, these AI audiobooks won’t be much of a help, either. Because Audible’s audiobook self-publishing platform, ACX, doesn’t accept “text-to-speech or other automated recordings,” these AI-produced audiobooks will not be available on Audible anytime soon. I’m assuming that the same requirements apply to traditional publishers as well. However, Amazon’s self-publishing platform, Kindle Direct Publishing, took a sharp turn in November when it announced that it would beta-test a feature that produces AI audiobooks from print books.

Although AI may pose a threat to the publishing industry, especially to narrators, it has proven to be beneficial to disabled people, such as Book Riot Contributing Editor Kendra Winchester, who writes about audiobooks and disability literature.

“For disabled people to truly have the access to books that we deserve, the audiobooks available shouldn’t be stripped of all of the humanity that narrators bring to their performances”

For Winchester, AI narration could prove useful in other ways. As someone who already uses Apple’s screen reader app on her phone, using AI narration technology to create a better screen reader could prove beneficial. Still, disabled people deserve more than flat, emotionless AI audiobooks. “For disabled people to truly have the access to books that we deserve, the audiobooks available shouldn’t be stripped of all of the humanity that narrators bring to their performances,” she wrote.

Bert Baxter, a member of the Deaf community, heavily relies on audiobooks for accessing written content. He said that the emergence of AI audiobooks has brought an exciting potential to enhance the Deaf community’s reading experience. Although he believes that AI audiobooks have the potential to greatly improve accessibility for Deaf people, he emphasizes the importance of AI audiobooks being produced with accessibility in mind, including support for different reading speeds and navigation options.

What Does This Mean for the Audiobook Industry?

These AI audiobooks appear impressive at first listen, but we’re actually still a long way from widespread adoption of AI in audiobook production.

“For now, these options are mainly being considered by self-publishing authors and academic publishers — or publishers that simply don’t have the resources to handle audiobook production,” publishing consultant Jane Friedman said when I asked her about the subject earlier this year. “While human narrators may feel threatened by this, I haven’t seen AI replacing jobs that would today be done by human narrators. It could happen in the future, especially if popular narrators license their voices for use.”

But given how quickly technology advances, how long will human narrators have before AI narrators “catch up”?

“AI narrators have already caught up to human narrators in the wild,” said Sil Hamilton, a Language Model Researcher at McGill University.

Project Gutenberg is not the only organization using AI narrators to produce audiobooks; Apple has been doing so for at least the past nine months. Called digital narration, it allows publishers to produce audiobooks out of their ebooks. Apple Books competes with Amazon’s Kindle Direct Publishing, which is the most popular self-publishing platform. Hamilton told me that because KDP doesn’t allow AI narrators, it’s possible that they don’t allow digital narrators to differentiate themselves and that many audiobook narrators were shocked by what Apple did. Apple, like Project Gutenberg, may require AI narrators to bridge the gap, he said:

“…perhaps a great AI narrator needs to understand the human condition before they perfectly mimic us.”

“However, whether their use in the wild determines whether AI narrators have ‘caught up’ to human narrators is only one heuristic,” Hamilton continued.

He explained that diffusion models, language models, and other predictive or generative deep learning algorithms all function by developing an understanding of their input data…While larger models can create more sophisticated representations of their data domain, they’re increasingly reaching computational limits. “The human voice exists in a narrow frequency range centered around 4000Hz, but as you suggest voice modifiers like intonation, implication, etc., all depend on the mind; not the voice — perhaps a great AI narrator needs to understand the human condition before they perfectly mimic us,” Hamilton clarified. “But whether that is required to automate away narrators’ jobs is unfortunately another question.”


These AI-produced audiobooks are yet another chapter in the saga of AI eroding human creativity. I hope it gets regulated in the future because producing audiobooks on such a large scale may crumble the industry.

These AI voices will definitely improve over time, so there must be safeguards in place when using them.