When to Use Synthetic Voices: An AI Voice Guide for Your Brand
Fueled by the voice tech boom, everybody who’s anybody is building out a content audio strategy for their brand.
Voice search and voice assistant technology using synthetic, artificial intelligence and human voices are becoming increasingly popular, and the need for brands to have a voice representing them on these growing audio-based mediums is expanding. Brands are contemplating if they should use a synthetic voice in their communication strategy.
The voice you select to represent your brand will have an impact on how and if customers trust you. You’ve spent a ton of time already considering how your brand can be authentic and trustworthy but now it has a literal voice that people will be interacting with on Amazon Echo, Google Home, the latest Apple HomePod and HomePod Mini, not to mention other voice-powered technology. We’re aiming to educate and inform you on this fast-changing voice technology landscape, so you, and your brand, don’t get left in the dust.
In this article, we introduce a Synthetic Voice Decision Making Matrix that helps you decide when to use a synthetic voice or a human voice and talk about the pros and cons of synthetic voice and human voice.
When Should a Brand Use a Synthetic Voice?
There’s been a lot of debate around the use of synthetic voice in audiobooks and other work historically done by professional voice talent. When, if ever, is it better for a synthetic voice to be used instead of a real one?
We’ve thought deeply about this over the years and developed a framework to communicate when and where it’s appropriate to use a synthetic or AI voice. Allow us to introduce the Synthetic Voice Matrix, a two-by-two grid that places the duration of the recording on the x-axis and the state of the content on the y-axis.
X Axis: Duration of the Recording
While most of us think of voice overs being at least 15 to 30 seconds for a commercial, audio ad, or even short video, there are some situations when the voice is heard for mere seconds, like a bus stop arrival announcement or elevator floor arrival announcement.
For this matrix, I have defined a voice over that’s short in duration as being anything less than 60 seconds.
Y Axis: State of Content
There are two different states of content: static and dynamic. Recognizing the difference between the states of content begins to reveal when a synthetic voice is simply more practical.
Dynamic Voice Content
Dynamic content adapts to the conditions that trigger it to play. Imagine being in an airport and learning about a gate change for your flight. Those are automated recordings done by a synthetic voice because it’s impractical to have a voice talent record moment-by-moment updates. Even less practical would be using a live announcer since gate changes may be occurring simultaneously in different terminals in the airport.
Another example of dynamic content would be real-time information that is constantly changing, such as weather alerts, stock quote updates, or even the latest sports scores. When the content is changing frequently, a synthetic voice is likely the most applicable.
Across the bottom, we have a time dimension. What’s interesting is that people are somewhat comfortable listening to a synthetic or automated voice if those voice prompts are less than 1 minute in these types of “dynamic content” situations. When they exceed 1 minute, that comfort begins to dissipate.
Whether a 10-second navigational prompt, turn by turn directions, or flight information changes, what we’re talking about here is dynamically changing content. We refer to that as Navigate. This is a perfectly appropriate use for a synthetic voice in place of a live human voice actor. Navigating directions and getting quick answers to questions when the content is changing is the ideal use for a synthetic voice. People tolerate AI voices well for this purpose.
As we extend the timeline to more than a minute, we refer to this as Educate. This form of content covers corporate training, educational content, and longer form public service announcements. Even though it’s a longer listening experience, some organizations mistakenly choose to work with a synthetic voice in an effort to save time and money.
Just because content is used for internal purposes doesn’t mean that you should cut corners on storytelling and production quality. Whether you’re educating employees, students, or the general public, it is recommended that a professional narrator deliver this important information.
Static Voice Content
Static content, on the other hand, means the content does not change depending on context. An example of static content would be informational material such as radio commercials, a podcast interview, or for longer recordings, the character voices in an animated movie.
Content that Informs
As we move up the quality scale, even though it’s short form content, voice actors are adding a lot of value in this situation. We refer to this as Inform. This includes commercials or other promotional content. You can’t fake authenticity and when you’re responsible for delivering a message meant to connect with your audience, it needs to sound like it’s coming from someone who understands what they are saying and the implications of their words.
One trend we’ve observed is episodic advertisements, or short ads, that get rolled out in a series. These ads have a single message and a short shelf life—and for good reason! If you’ve seen or heard something once in your feed, it can feel like you’ve seen it a hundred times. Case in point, social media platforms even have an option to report advertisements that are ‘too repetitive.’
In response to this, brand marketers are creating significantly more variation in their ads. While the information being shared may clock in at less than a minute, telling a story in seconds requires skill and emotional intelligence that AI voices simply do not have. Each ad has a different message, sometimes even read by a different voice.
Content that Entertains
Finally, there’s Entertain. The listening time is longer than 1 minute but can run as long as 10 hours or more. The content may be episodic or choose your own adventure. Audiobook narration is a popular form of entertainment that couldn’t possibly be done by an AI voice. An audiobook narrator is highly skilled in the interpretation and communication of the author’s intent and character development, and can grasp the big picture for where the story is ultimately headed.
For story-driven and character-driven content, working with a voice actor makes the most sense. The content is static in its nature, is story-driven, and often includes a cast of characters.
Pros and Cons of Using a Synthetic Voice:
Now that we’ve sussed out when is appropriate to use a synthetic voice, the decision still needs to be made: just because you can use a synthetic voice for dynamic, short duration content, does that mean you will? to help you further decide, here are the pros and cons of synthetic voices as well as the pros and cons of the human voice:
Synthetic Voice Pros
- Control. Using a platform like Voysis or Voicery will allow you to have a full control over a completely unique voice that was customized for your brand. You could have complete ownership over that voice and not worry about any other company in the world having that same voice.
- Cost. Voicery charges $0.001 per character on their scripts. Lyrebird is currently free for users. Voysis doesn’t publicly list their pricing.
- Instant production. As soon as you input the words, you can get an AI voice interpretation of the content at the click of a button.
Synthetic Voice Cons
- Ethics. There are some serious ethical issues with robotic voices appearing to be humans communicating with humans. Should robots be allowed to sound like humans or should there always be a way to distinguish a robot from a human?
- Still not life-like. While these latest AI Voice advancements are impressive, you can still detect a layer of robotic, non-human sounding tones and inflection. There’s a good chance that this may be detected by your customers.
- Soul. Even when AI voice does catch up and can completely mimic the tone, pace, delivery, pitch and inflection of a human voice, it will still be missing the most important part of what separates us from the robots: a soul. Think about the brands that customers make that deep connection with. They all have a heartbeat behind the brand that people can sense and connect with. This will be felt by your customers when you don’t use a real human voice.
- Unoriginal. Chances are, thousands of other people are using one of these free or relatively inexpensive speech synthesizers. That means other people have heard this same, robotic voice speaking before.
- Authenticity. When a real person (be it a celebrity or well-known public figure) voices your brand, that person’s lifestyle and ethos are layered on to how people will view your brand. Think Matthew McConaughey for the Lincoln Motor Company. His smooth, relaxed and sophisticated voice and perceived public lifestyle ooze into the ads he does for Lincoln. This, in turn, makes the customer associate those traits with the brand. Your brand won’t be able to access that depth when you use AI voice.
Pros and Cons of Using a Human Voice:
Human Voice Pros:
- Real. A human voice will always be a human voice. No amount of programming will allow a robot to communicate the way that a completely unique person, who has a world of specific experiences and moments in their life, can. Everyone’s journey shapes their voice and how they tell your story. That takes a lifetime, not the short time span required for programming and simulation.
- Fewer legal headaches. Next to no laws have been put in place to police AI Voice companies on what ways robots can communicate like humans to humans. Yes, it’s a wild-west right now, but just like every other piece of culture-shifting tech, government policy will follow (slowly) behind. Using a human voice will allow your brand to avoid any potential legal headaches that may arise from adopting AI Voice early on.
Human Voice Cons:
- Limited Career and Lifespan. It may seem morbid, but if you have a particular human voice representing your brand, that individual has a limited lifespan and at any point, could change careers or retire.
- Variable Compensation. Humans will seek more compensation for their work. The scale of pay for a voice actor can depend on many factors, including their experience, level of fame and ultimately, their skill and fit for their brand. You will also have to consider how you pay them for the end product, which has its own pay scale depending on the time it takes to produce, as well as the duration of your license.
- Reputation Can Be Unpredictable: Many brands have been burned by being associated with a celebrity, whose behavior is found to be in misalignment with the brand values. This is unpredictable and can cause you to have to pivot quickly.
Voice Assistants Powered by Voice Actors
When crafting the artistic direction for the sound you’re going for, consider the research conducted by the team at Voices that identified the most common vocal archetypes.
The most popular voices are dominated by 3 vocal archetypes:
Deep and Authoritative
Celebrity examples: Oprah, Morgan Freeman, Sam Elliott, James Earl Jones, Cate Blanchett
Celebrity examples: George Clooney, Meryl Streep, Matthew McConaughey, Taylor Swift
Fun, Adventurous, Intrigue
Celebrity examples: Nicole Kidman, Trevor Noah, Keira Knightley, Hugh Jackman
Questions and Concerns in Hiring a Voice Actor for a Voice Assistant
In our research, brand marketers raised some concerns about the feasibility of hiring a professional voice actor for their synthetic voice initiative.
Those who considered hiring a voice actor for a voice assistant revealed that they had a tight deadline and the timing simply didn’t work (14.6%), while others had the intent to hire a voice actor, however given the changing nature of their content, they expressed concern about maintaining the updates (12.5%).
Surprisingly, cost was only reported to be of concern for a small fraction of the survey respondents (6.3%), as these projects are likely funded through research and development budgets. Finally, a group of individuals really didn’t know where to start or where to look to find a professional voice talent for their voice assistant (4.2%).
Finding the Right Voice for your Voice Assistant
As you’ve likely experienced, hiring a professional voice for your project—whether it’s a voice assistant or another voice application—is both fast and easy when using Voices.
Simply post your job and qualified voice talent will reply to you with a quote for the work. Plus, you’ll be able to hear samples of your script for free, so you can compare quality, tone and performance, side-by-side. Once you’ve found the right voice, hire them through the marketplace.
How Brand Voice Will Be Heard Now and Into the Future
Now that you have all of the options and the pros and cons of each, the next step is sorting out how you will apply voice to your brand now and into the future.
If you’re looking to get started, or build upon your audio content strategy, here are some of the biggest opportunities for modern brand marketers to extend their marketing into the audio medium:
- Branded podcasts: We’ve outlined how to start a podcast, as well as created an inspiring list of brands that have launched successful podcasts.
- Alexa Skills, Google Actions, etc. – the new ‘app’ on home voice-assistant devices Campbell’s Soup brand offers an inspiring example of a brand that’s created a skill, plus we’ve also outlined how to build an Alexa Flash Briefing here.
- Audio content for your blog/website Here’s our round up of 5 Audio Blogs created by leading publications and influencers.
- Creating a soundmark for your brand to be used in these new audio mediums
This article was originally published November 2020 by David Ciccarelli.
Leave a Reply