Technology

What Is Text to Speech? Part 1 in our TTS Series

Keaton Robbins | May 20, 2022

A white man wearing a pink shirt and jeans leans against his desk while checking his phone.

Text to Speech (TTS) tools are a relatively new technology, but chances are you have already seen, or more accurately, heard TTS in action. Many companies have begun developing and implementing their own read-aloud tools in mobile devices, computers, tablets, cars and more.

Maybe you are curious about text to speech technology and where it comes from. Or perhaps you have already begun to hear a lot about how modern businesses are using text to speech applications across many different industries, and you want to know how you can utilize them too.

In this article

  1. What Is Text to Speech?
  2. The History of Text to Speech

In either case, this first article in our four-part TTS series will provide you with everything you need to know about TTS.

Let’s get started by learning about what TTS is:

What Is Text to Speech?

So, what is TTS? At its most basic, Text to Speech (TTS) technology is a tool that converts digital text on a screen into audio. Think of it like an audiobook, except that TTS tools rely on stored phonemes that a software program uses to quickly convert any text into sound, rather than having a static, pre-recorded piece of media.

Of course, even the phonemes, or sounds, a TTS software uses must first be recorded by someone. Often, this job is given to a trained voice over artist who is able to provide engaging, easy-to-understand audio while still sounding natural. 

Sometimes, a company will ask the voice over artist to record individual phonemes, which can help provide a more extensive range of possible combinations for output. In other cases, such as when there is a specific intended use for the recordings, a voice actor may record entire words or even sentences. This can provide greater clarity for a more natural and higher-quality audio.

The History of Text to Speech

The history of humankind trying to build machines that could replicate human speech can be traced all the way back to the legends of the ancient Greeks. 

In the late 1700s, scientists worked to construct machines that could produce mechanical speech when powered by bellows. They even implemented models of the tongue and lips so the device could make both consonants and vowels.

Jump forward to the 1930s, when Bell Labs develops the VOCODER, a keyboard-operated speech analyzer, and synthesizer. 

In the early 1950s, Haskin Laboratories built the Pattern playback, a machine capable of analyzing an image of acoustic speech patterns and then replicating the audio.

Scientists also developed the first computer-based speech synthesis programs in the 1950s, with the first full text to speech system being completed in 1968. 

Obviously, our understanding of computer science and audio technology has come a long way since then. With the development of digital technology as well as advances in machine learning and recording, it is now easier than ever for companies and individuals to utilize TTS.

Leave a Reply

Your email address will not be published. Required fields are marked *