Google DeepMind gets closer to sounding human
Researchers at DeepMind use WaveNet AI to mimic human speech


Artificial intelligence researchers at DeepMind have created some of the most realistic sounding human-like speech, using neural networks.
Dubbed WaveNet, the AI promises significant improvements to computer-generated speech, and could eventually be used in digital personal assistants such as Siri, Cortana and Amazon's Alexa.
The technology generates voices by sampling real human speech from both English and Mandarin speakers. In tests, the WaveNet generated speech was found to be more realistic than other forms of text-to-speech programs but still falling short of being truly convincing.
In 500 blind tests, respondents were asked to judge sample sentences on a scale of one to five (five being most realistic). WaveNet was rated 4.21 in English and 4.08 in Mandarin (actual human speech was rated 4.55 in English and 4.21 in Mandarin in the tests). That side, WaveNet managed to outperform other speech methods.
While other artificial speech generators focus on language, WaveNet targets the sound waves being produced, analysing raw audio signal waveforms and modelling speech on that. The researchers also used the same technique to produce music after listening to piano solos on YouTube.
"WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next," said Deepmind in a blog post.
Deepmind has also published a paper that goes into much more detail on the technology.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
The research outfit was also responsible for creating an AI system to beat a champion Go player this year.
Rene Millman is a freelance writer and broadcaster who covers cybersecurity, AI, IoT, and the cloud. He also works as a contributing analyst at GigaOm and has previously worked as an analyst for Gartner covering the infrastructure market. He has made numerous television appearances to give his views and expertise on technology trends and companies that affect and shape our lives. You can follow Rene Millman on Twitter.
-
Cleo attack victim list grows as Hertz confirms customer data stolen
News Hertz has confirmed it suffered a data breach as a result of the Cleo zero-day vulnerability in late 2024, with the car rental giant warning that customer data was stolen.
By Ross Kelly
-
Lateral moves in tech: Why leaders should support employee mobility
In-depth Encouraging staff to switch roles can have long-term benefits for skills in the tech sector
By Keri Allan