Fueled by the voice tech boom, everybody who’s anybody is building out a content audio strategy for their brand.
Voice search and voice assistant technology using synthetic, artificial intelligence and human voices are becoming increasingly popular, and the need for brands to have a voice representing them on these growing audio-based mediums is expanding.
The voice you select to represent your brand will have an impact on how and if customers trust you. You’ve spent a ton of time already considering how your brand can be authentic and trustworthy but now it has a literal voice that people will be interacting with on Amazon Echo, Google Home and other voice-powered technology. We’re aiming to educate and inform you on this fast-changing voice technology landscape, so you, and your brand, don’t get left in the dust.
In this article, we will highlight what synthetic, artificial intelligence (AI) voices and human voices offer, and outline the pros and cons that the three voice options offer to your brand.
What is Synthetic Voice?
A synthetic voice is an artificially produced version of human speech.
Speech synthesis is just another form of information output where a computer reads words to you out loud in a real or simulated voice, played through the device’s speaker; this is often called text-to-speech (TTS).
How is a Synthetic Voice Produced?
Say you need a paragraph of written text that you want your computer to speak aloud. How does it turn those physical typed-out words into ones you hear? Synthetic voice is produced in three stages: Text to words, words to phonemes and phonemes to sound.
A synthetic voice is created in three stages:
- Text to words: Pre-processing or normalization is done to reduce ambiguity as the computer narrows down how the piece of text is read.
- Words to phonemes: The speech synthesizer has to generate speech sounds that make up those words. In the most straightforward explanation possible, the computer has a dictionary of words and ways to pronounce certain groups of letters (phonemes) and reads the words.
- Phonemes to sound: The sequence of written words is now into a sequence of sounds that need speaking. The computer can take a few different approaches. It can use recordings of humans saying the phonemes (concatenative), it can reference basic sound frequencies to generate the sounds itself (formant) or it can mimic the mechanisms of the human voice (articulatory).
Once the synthetic voice is produced, it can be implemented in software or hardware products like Google Home, Amazon Echo, your tablet, smartphone, GPS, ebook reader, etc.
Synthetic Voice Pros:
- Cheap. Speech synthesizers are a dime a dozen these days, so most are free. Just type ‘speech synthesizer’ into any search engine, and have your pick of whatever online text-to-speech tool you want to use.
- Fast. You can literally plop in your script or text, hit enter and the computer will repeat your lines. Boom! There’s your robotic voice actor.
Synthetic Voice Cons:
- Unrealistic. Every single one of these speech synthesizers sounds a lot like a robot. Sure, there are a few that sound less like a robot, but this voice is not typically a good fit when it comes to most companies’ brand voice.
- Unoriginal. Chances are, thousands of other people are using one of these free or relatively inexpensive speech synthesizers. That means other people have heard this same, robotic voice speaking before.
What is AI Voice?
Artificial intelligence or AI voice is type of synthetic voice, but it operates a little differently. Where it differs is that AI voice uses ‘deep learning,’ which is a type of artificial intelligence, to turn text into audible human-sounding speech.
While a lot of robotic text-to-speech sounding speech synthesizers use task-based algorithms, deep learning allows AI voice companies to use machine learning methods, based on learning data representations to create audio like this:
Those were three artificial voices made to sound like Barack Obama, Donald Trump and Hillary Clinton. Montreal-based tech company, Lyrebird, was able to create the imitating voices, which say phrases that none of the American politicians said, using just a few minutes of audio from speeches with background noise and reverb.
Lyrebird also claims it can recreate your voice and turn it into your digital voiceprint using a minute of sample audio that you can upload on their website.
And they did a pretty convincing job with Ashlee Vance’s voice in this Bloomberg piece.
Lyrebird does this by analyzing a recording of your voice, breaking it into pieces based on phonemes. You then type whatever you want in the website’s textbox. Their platform uses your uploaded voice model to build completely new words and phrases. Yes, that means ones that weren’t in the original recording.
Companies like Voysis are also pushing the limits.
They directly process raw audio to create new and markedly more human voices in contrast to every other text-to-speech synthesizer out there.
The staggering part of this, is Voysis built their voice off of an existing method called WaveNet that was discovered by researchers at Google’s DeepMind in 2016.
Give it a listen:
Thankfully, a company is emerging to ensure this cutting-edge voice tech is kept in check.
Pindrop is putting together the software that will protect all of these digital vocal identities created by AI voice platforms.
The new voice ‘fingerprinting’ tech company analyzes 1,400 different acoustic attributes to validate vocal identities on voice-powered tech.
What brands are Using Synthetic or AI Voice?
Note: If you’re wanting to build out an Alexa Skill for your brand, check out this comprehensive guide we’ve put together for you.
AI Voice Pros:
- Control. Using a platform like Voysis or Voicery will allow you to have a full control over a completely unique voice that was customized for your brand. You could have complete ownership over that voice and not worry about any other company in the world having that same voice.
- Cost. Voicery charges $0.001 per character on their scripts. Lyrebird is currently free for users. Voysis doesn’t publicly list their pricing.
- Instant production. As soon as you input the words, you can get an AI voice interpretation of the content at the click of a button.
AI Voice Cons
- Ethics. There are some serious ethical issues with robotic voices appearing to be humans communicating with humans. Should robots be allowed to sound like humans or should there always be a way to distinguish a robot from a human?
- Still not life-like. While these latest AI Voice advancements are impressive, you can still detect a layer of robotic, non-human sounding tones and inflection. There’s a good chance that this may be detected by your customers.
- Soul. Even when AI voice does catch up and can completely mimic the tone, pace, delivery, pitch and inflection of a human voice, it will still be missing the most important part of what separates us from the robots: a soul. Think about the brands that customers make that deep connection with. They all have a heartbeat behind the brand that people can sense and connect with. This will be felt by your customers when you don’t use a real human voice.
- Authenticity. When a real person (be it a celebrity or well-known public figure) voices your brand, that person’s lifestyle and ethos are layered on to how people will view your brand. Think Matthew McConaughey for the Lincoln Motor Company. His smooth, relaxed and sophisticated voice and perceived public lifestyle ooze into the ads he does for Lincoln. This, in turn, makes the customer associate those traits with the brand. Your brand won’t be able to access that depth when you use AI voice.
What is Human Voice?
Long before Synthetic and AI Voice were following another three-stage sound creation process, our incredible bodies were making and creating unique sounds, songs and voices. In terms of communication, the human voice is unmatched in its ability to convey detailed information that extends beyond the words we’re using.
When two people talk and actually understand each other, this incredible brain-imaging study suggests that both human brains synchronize.
“It is as if they are dancing in parallel, the listener’s brain activity mirroring that of the speaker with a short delay,” says Emma Seppala, science director of Stanford University’s Center for Compassion and Altruism Research and Education.
This level of natural brain synchronizing will never be able to happen between a human and computer. It also perhaps unlocks the code to how humans convey that deeper level of emotion to each other.
A study by Michael Kraus of the Yale University School of Management showed that when we only listen to voices (compared to looking at facial response and voice), the human’s ability to detect subtleties (specifically emotion) in vocal tone increases.
We can isolate the way speakers are (or aren’t) expressing themselves.
This may be why it’s so hard for algorithms to capture the unique sounds of the human voice – and why so many Synthetic or AI voices sound robotic or just flat-out false, even when we can’t put our finger on why.
There are currently around seven billion unique voices in the world and growing. All of them have a different story and experience that is distinctly theirs.
Human Voice Over Pros:
- Real. A human voice will always be a human voice. No amount of programming will allow a robot to communicate the way that a completely unique person, who has a world of specific experiences and moments in their life, can. Everyone’s journey shapes their voice and how they tell your story. That takes a lifetime, not the short time span required for programming and simulation.
- Fewer legal headaches. Next to no laws have been put in place to police AI Voice companies on what ways robots can communicate like humans to humans. Yes, it’s a wild-west right now, but just like every other piece of culture-shifting tech, government policy will follow (slowly) behind. Using a human voice will allow your brand to avoid any potential legal headaches that may arise from adopting AI Voice early on.
Human Voice Over Cons:
- Limited Career and Lifespan. It may seem morbid, but if you have a particular human voice representing your brand, that individual has a limited lifespan and at any point, could change careers or retire.
- Variable Compensation. Humans will seek more compensation for their work. The scale of pay for a voice actor can depend on many factors, including their experience, level of fame and ultimately, their skill and fit for their brand. You will also have to consider how you pay them for the end product, which has its own pay scale depending on the time it takes to produce, as well as the duration of your license.
- Reputation Can Be Unpredictable: Many brands have been burned by being associated with a celebrity, whose behavior is found to be in misalignment with the brand values. This is unpredictable and can cause you to have to pivot quickly.
How Brand Voice Will Be Heard Now and Into the Future
Now that you have all of the options and the pros and cons of each, the next step is sorting out how you will apply voice to your brand now and into the future.
There are some great audio content creation options. If you’re looking to get started, or build upon your content audio strategy, these are some of the biggest opportunities for modern brand marketers to extend their marketing into the audio medium:
- Branded podcasts: We’ve outlined how to start a podcast, as well as created an inspiring list of brands that have launched successful podcasts.
- Alexa Skills, Google Actions, etc – the new ‘app’ on home voice-assistant devices Campbell’s Soup brand offers an inspiring example of a brand that’s created a skill, plus we’ve also outlined how to build an Alexa Flash Briefing here.
- Audio content for your blog/website (enhancement, extension of video marketing strategy, etc). Here’s our round up of 5 Audio Blogs created by leading publications and influencers.
- Creating a soundmark for your brand to be used in these new audio mediums. (example: Visa). Read more about audio mascots.
How Will Your Brand Be Leveraging Audio?
Have you considered or used any of the above vocal options? What do you think is the best match for your brand and why?
Please share in the comments below – our community would love to learn from your experience!