Male podcaster smiling at the microphone, headphones covering his ears
Everywhere you go, it seems like there is a voice talking to you. New technologies incorporate the human voice in many ways, many of these applications employing the means of text-to-speech (TTS).
Do you think that TTS poses a threat to custom voice recordings? Why or why not?
Be sure to state your case in today’s VOX Daily!

I May Be Biased, But…

Recently I had an opportunity to defend the intrinsic value and necessity of custom voice-over recordings in a debate over whether or not text-to-speech would ever fully replace the need for actors to record custom voice-overs in studio.
Being the co-founder of a company that specializes in the recording of spoken word messages, it is safe to say that I am more than a little biased. That being said, there are many different factors as to why text-to-speech will never fully replace the human voice.

In a nutshell, these factors include:
– Sheer number of different languages, dialects, accents, vocabulary and manner of speech (linguistics)
– Complexity of cultural, historical and societal nuance/understanding of context (social)
– The need to customize or brand for corporate purposes picking a specific voice (customization)
– Cognitive ability to know and see the ‘big picture’ when telling a story or making an argument (suspension of disbelief)
Artistic direction that can be interpreted and internalized making a read more believable (performance)

A Closer Look

When you consider that there are 6,800+ spoken languages being used today, the potential for text-to-speech to replace the human voice, its delivery, correct pronunciation, tone, nuance and so on is difficult to comprehend let alone achieve.

There are so many things that a computer program cannot infer or know. Information, when interpreted, could be expressed in myriad ways depending on the situation, context and audience. It is up to the individual performing the script to properly assess what it is that they are reading and to know how best to convey that information to the intended audience. This is what makes custom voice-overs so effective.

The voice artist uses discernment and all of the tools at their disposal to act like a detective as it were to become educated on the subject, develop a character and determine how best to present the message to those meant to hear it. There are unspoken sentiments that can be expressed using the human voice in a performance that would not be as effective, artistically or technically, if TTS were the go-to solution.

Something else to consider is intent. An educated voice actor makes choices whereas an untrained actor makes guesses. The actor uses their own experiences (method acting) and combines those with the information in front of them to craft a read that is both accurate and persuasive (emotion). The computer program could be considered untrained in the sense that the selections it makes are based upon formulas and not upon heart knowledge. Head knowledge is important but heart knowledge is critical to comprehension and communicating effectively.

What Do You Think?

Will text-to-speech ever be on par with custom voice-over recording?
Looking forward to hearing from you!
Best wishes,
© Snow

Previous articleVoice Acting For Dummies Book Signing
Next articleAuditioning Tips From Project Managers
Stephanie Ciccarelli is the Co-Founder and Chief Brand Officer of Classically trained in voice, piano, violin and musical theatre, as well as a respected mentor and industry speaker, Stephanie graduated with a Bachelor of Musical Arts from the Don Wright Faculty of Music at the University of Western Ontario. Possessing a great love for imparting knowledge and empowering others, her podcast Sound Stories serves an audience that wants to achieve excellence in storytelling. Stephanie is found on the PROFIT Magazine W100 list three times (2013, 2015 and 2016), a ranking of Canada's top female entrepreneurs, and is the author of Voice Acting for Dummies®.


  1. Couldn’t agree more, Stephanie! For a lot of the “press 1” stuff, TTS may be satisfactory. However, not only do you make great arguments from the voice actor’s side (I’ve done it for years!) but I’ve learned that for the listener, inflection and understanding are key to a) keeping them engaged and b) getting the proper response. And isn’t that the point?

  2. Any musical instrument under the sun has been sampled, and entire symphony orchestras can come out of a can. Yet, people are still buying real Steinways, and there are plenty of musicians who make a very decent living.
    Do I think that we’ll ever see the time when Stravinsky’s “Rite of Spring” as performed on virtual instruments, will win a Grammy? Will a laboratory ever be able to produce a recording of Bach’s cello solo sonatas that rivals the depth of Yo Yo Ma’s interpretation?
    No way!
    There’s still hope for the most subtle, most flexible, most surprising and unique of all instruments: the human voice.
    Here’s the rub: robots have a hard time emoting. They can patiently and dispassionately guide you to the next exit, but they have a hard time expressing even the most basic of feelings such as fear, anger, hurt, guilt and… love.
    The inimitable subtleties of the human voice can leave us… speechless.

  3. I don’t think automated speech synthesis will ever replace a human voice completely.
    I can however foresee a future where the speech synthesis gets so good that it *can* convey emotion, and imitate accent well enough that for mass media production it will be a cheaper, easily customisable alternative to human voice-overs.
    It is not too much of a stretch to see the same split between synthesize and live-acted voice as we have today between synthesized and live music. The live version will be considered more artistic – better quality for those who can appreciate the difference – and the mass media will churn out synthesized but popular garbage.
    That said – I very much doubt that will happen in *any* of our lifetimes. I’m estimating at least 80-100 years out for this scenario.

  4. TTS won’t completely replace the human voice, IMO. That being said, I have actually seen a recording company called Learning Ally, which recorded materials for the blind and deaf, shut its doors earlier this year because of products containing TTS such as Dragon becoming mainstream.
    However, given that the realm of the voice over industry composes of many different avenues, such as animation, books on record, and documentaries, areas such as these need clear, concise emotion to be effective. Therefore, if you replace a human voice with TTS in these areas, all you would get is a monotonous drone that largely lacks the emotion you would receive in a human voice.
    As a trained actor and voice actor, you make a good point about how we make choices in portraying emotion, Stephanie. Computers, even those with AI, don’t have that ability. This alone is enough for me to remain confident that our voices will continue to be standard for decades to come.

  5. I don’t think I’ll be able to trust a robotic voice coming out of TTS technology. I would still prefer the warmth and real emotion in a real person’s voice.


Please enter your comment!
Please enter your name here