• Cryptocurrency
  • Earnings
  • Enterprise
  • About TechBooky
  • Submit Article
  • Advertise Here
  • Contact Us
TechBooky
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
TechBooky
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Home Artificial Intelligence

Google’s DeepMind Mimicks The Human Voice And Achieves A Result Close Our Voice In Major Breakthrough

Paul Balo by Paul Balo
September 11, 2016
in Artificial Intelligence
Share on FacebookShare on Twitter

It looks like Google’s DeepMind has recorded a breakthrough it says is better than 50 percent of existing technology. The UK based company DeepMind is aiming to develop computers with “super” artificial intelligence (AI) capabilities.

In a post on their website, they say they have now been able to create something as realistic as it can get to the human voice. Dubbed WaveNet, the system is able to correlate individual sound waves humans create and they compared their results to existing programs including Google’s, the say they have surpassed all of those at least by 50 percent thereby bringing us closer to a more realistic text to speech future.

So if you’ve read so far and you’re thinking, this is just another advanced recorder, no it isn’t quite because the aim is to teach machines how humans pronounce words in different languages and make them form new words of their own and the closer we get to the technology’s perfection, it means we can have a closer interactions with machines in future just as you would humans.

A large set of short recordings are fed into a computer and by combining these human voices and systems like WaveNet learn from these to form new words altogether and that’s what this technology is about. While companies like Apple are still silent on their plans for AI, at least we know that their digital assistant Siri will now be opened to developers but this milestone no doubt puts Google a step ahead into the AI future.

 

How does this differ really from Siri, Cortana or Alexa?

The first thing to note is that they are all digital assistants by tech companies that rely on artificial intelligence to help you out with queries.   What happens with these assistants is that you engage them and they reply in human voice (we know the voice of Siri at least) and this all happens in a process called concatenative text to speech and is defined a system “where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.” Put in other words, this means that the current Siri and Cortana don’t have feelings which can expressed by humans in tones without altering an entire database and can only say what they have been told to tell you. While it has been largely successful in its own right, to make concatenative text to speech (TTS) have changing tones for example, you would need to have a humans/humans for that matter record every possible sound there is in different ways and that’s a daunting task The other way of doing this is through Parametric TTS which is considered too robotic.

 

Parametric Text To Speech (TTS)

This is a purely computer model which relies on programmed rules and don’t need human voice inputs and while this is so, output depends on the signal processing method used.   As DeepMind put it, “contents and characteristics of the speech can be controlled via the inputs to the model.” This can be used in embedded systems with limited memory. When you look at the chart we provide below, you’ll it underperforms all other methods at least in English language but in Chinese Mandarin, it’s a different story but that not that good.

 

WaveNet

This is the new Google method which it says is next to the human voice when all other methods are stacked together on a chart

WaveNet works quite differently from the last two methods used in current AI systems by learning from human recordings and then independently creating its own different kinds of voices and words for that matter. So this builds on the concatenative TTS to make interaction with machine wear a “human face”. As humans we pause and breathe when talking and that’s something WaveNet does too. Taking this a step further, WaveNet is able to learn from sounds to develop a whole new content in a different way that appeals to a different context from the original content  and that’s a huge step towards a whole new AI future, call it AI on steroids if you wish and you won’t be wrong. Here’s how they put it at Google, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.

 

Test Results

wavenet-deepmind

The scale of measurement is from 1 to 5 with 1 being unrealistic and 5 being most realistic based on listeners from 500 blind tests conducted by the team at DeepMind. Listeners rated WaveNet 4.21 in English and another 4.08 in Mandarin. The human speech scored 4.55 out of a possible 5 and that’s not even a perfect score for humans but this still shows how close WaveNet is getting to the human voice tone and greatly outperformed the concatenative and Parametric TTS methods. You can listen to the audio below for yourself;

Parametric TTS

Audio Player
http://techbooky.com/wp-content/uploads/2016/09/parametric-DeepMind.wav
00:00
00:00
00:00
Use Up/Down Arrow keys to increase or decrease volume.

Concatenative TTS

Audio Player
http://techbooky.com/wp-content/uploads/2016/09/concatenative-DeepMind.wav
00:00
00:00
00:00
Use Up/Down Arrow keys to increase or decrease volume.

WaveNet

Audio Player
http://techbooky.com/wp-content/uploads/2016/09/wavenet-DeepMind.wav
00:00
00:00
00:00
Use Up/Down Arrow keys to increase or decrease volume.

 

Challenges

It’s computationally expensive to take WaveNet commercial at the moment and as they put it, it requires a high sampling rate of 16,000 times per second for a single audio file.  This means that the processing the analogue human sound into digital which the computer understands is cumbersome for the WaveNet output quality. Each sample forms prediction based on prior samples and that’s all part of the signal processing technique.

 

Future

DeepMind is responsible for AlphaGo which is a program developed for board game GO and beat the top ranked player this year in the game. All big tech companies have all announced steps to make their digital assistant services more attractive and WaveNet could eventually be the way to go. With better processing techniques, this could well become the future of AI with respect to digital assistants. About 20 percent of searches on Google are now voice based and this could make Google increase funding to this area of research eventually. Before tech giants started paying considerable attention to mobile, it took a while too.

But like the space and weapons race of the 60s and 70s, we may be seeing an AI race to the top by tech companies too and that’s a good thing.

DeepMind is British Artificial Intelligence company and was acquired by Google in 2014

Related Posts:

  • tr_20241028-google-cloud-platform-the-smart-persons-guide
    Google Cloud Adds Chirp 3 Audio Generation to Vertex AI
  • google-io-2023-051023-88
    Google Can Train Search AI on Content Without…
  • Google-Bard-1
    Google Introduces Bard, A Conversation AI Bot To…
  • Deepmind-Robotics-Chatbot-Business-2021265856
    Google Forms New Team to Develop AI To Replicate Real World
  • app icons, social media, search _ logo, google, engine, software_md
    Google Denies Bard Was Trained With ChatGPT Data
  • google_io_2024_55
    Google Unveils New Generative AI Tool - Veo, For Filmmakers
  • 159c6280-61fb-11ee-b50d-2c1a44c0e8e0
    The New Pixel Buds Pro Can Detect Conversation
  • cf121196-1-CHATGPT
    ChatGPT Launches Desktop Apps with Voice Mode

Discover more from TechBooky

Subscribe to get the latest posts sent to your email.

Tags: AIalexaartificial intelligencecortanadeepmindgoogleresearchsiri
Paul Balo

Paul Balo

Paul Balo is the founder of TechBooky and a highly skilled wireless communications professional with a strong background in cloud computing, offering extensive experience in designing, implementing, and managing wireless communication systems.

BROWSE BY CATEGORIES

Select Category

    Receive top tech news directly in your inbox

    subscription from
    Loading

    Freshly Squeezed

    • Nigeria to Release First AI Guidelines Within Weeks, Says Alake May 20, 2025
    • Google Secures Major Solar Deal for Data Centre Power May 20, 2025
    • NIBSS and CAC Launch API to Streamline Business Operations May 20, 2025
    • Nigeria Plans $2B Fibre Network to Expand Internet Access May 20, 2025
    • AI Helps Google One Reach 150 Million Subscribers May 16, 2025
    • FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms May 16, 2025

    Browse Archives

    May 2025
    MTWTFSS
     1234
    567891011
    12131415161718
    19202122232425
    262728293031 
    « Apr    

    Quick Links

    • About TechBooky
    • Advertise Here
    • Contact us
    • Submit Article
    • Privacy Policy

    Recent News

    Nigeria to Release First AI Guidelines Within Weeks, Says Alake

    Nigeria to Release First AI Guidelines Within Weeks, Says Alake

    May 20, 2025
    Google Secures Major Solar Deal for Data Centre Power

    Google Secures Major Solar Deal for Data Centre Power

    May 20, 2025
    NIBSS and CAC Launch API to Streamline Business Operations

    NIBSS and CAC Launch API to Streamline Business Operations

    May 20, 2025
    Nigeria Plans $2B Fibre Network to Expand Internet Access

    Nigeria Plans $2B Fibre Network to Expand Internet Access

    May 20, 2025
    AI Helps Google One Reach 150 Million Subscribers

    AI Helps Google One Reach 150 Million Subscribers

    May 16, 2025
    FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms

    FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms

    May 16, 2025
    • Login

    © 2021 Design By Tech Booky Elite

    Generic selectors
    Exact matches only
    Search in title
    Search in content
    Post Type Selectors
    • African
    • Artificial Intelligence
    • Gadgets
    • Metaverse
    • Tips
    • About TechBooky
    • Advertise Here
    • Submit Article
    • Contact us

    © 2021 Design By Tech Booky Elite

    Discover more from TechBooky

    Subscribe now to keep reading and get access to the full archive.

    Continue reading

    We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok