Wondering how that speech you’re writing sounds? Using text-to-speech (TTS) software, you can have your computer read it to you. TTS converts words from any electronic document into audible speech. Several vendors make TTS applications that let you listen to documents, e-mail, and Web pages from the comfort of your PC.
TTS offers a variety of benefits for different people. It’s a good way to judge the effectiveness of your writing—listening to your own words spoken aloud can be very helpful. TTS is a handy method for proofreading a document. You’ll hear typos and spelling errors that you might miss through a spell check or during manual proofreading. Some TTS programs can read your words aloud as you type them, acting as a virtual proofreading partner.
TTS software can read any type of text file, including e-books, which are available from many Web sites. Sites such as Project Gutenberg (www.gutenberg.net) provide thousands of free public-domain stories and novels that you can download and listen to through your TTS application.
Individuals with disabilities such as a visual impairment can benefit from TTS technology. Furthermore, TTS can be a powerful teaching tool for people who have reading difficulties or are learning English as a second language.
How TTS Works
A TTS system starts by breaking down electronic words into their phonemes, the smallest unit of speech unique and different from other units of speech. The letter a in the word “car” and the word “ate,” the letters oo in the word “book,” and the letters ng in the word “sing” are all examples of unique phonemes. A word like “caught” is comprised of three distinct phonemes—a “k” sound for the first letter, “au” for the next two letters, and “t” for the remaining letters of the word. Once the TTS software has separated each word into its phonemes, it adds specific speech attributes, such as volume, pitch, and tone, which help the speech sound more natural and fluid. The phonemes and speech attributes are then assembled and converted into audio to form the spoken output.
The TTS application faces certain stumbling blocks as it tries to read text. A major challenge is presented by words that are always spelled the same but can sound differently depending on their context in a sentence. Words such as “close,” “lead,” and “perfect” fall into this category. A TTS application analyzes the entire sentence to determine how to pronounce these words. One example of this is the sentence “I want to record a record.” By dissecting the sentence, the application knows to pronounce the word “record” differently in each of its instances. No software, including TTS, is 100 percent perfect, so even dissecting the sentence doesn’t ensure accurate pronunciation every time.
TTS software can mispronounce both common and uncommon words. Names of people and places, acronyms, abbreviations, and complex words can all cause trouble. To solve this problem, most TTS applications have built-in pronunciation dictionaries to which you can add mispronounced or misunderstood words. You teach the software how to pronounce these words by building them using their correct phonemes.
TTS has always suffered from one key drawback: the way it sounds. Since the speech is synthesized, it often has a cold, mechanical quality. Most TTS speech sounds as if were coming from a robot in a grade-B science fiction movie. You may have heard the synthesized voice from the speech system used by noted physicist Stephen Hawking. TTS speech lacks most of the nuances that distinguish human speech, such as real inflection or cadence, and gives little variation or emphasis to words in a sentence. You’d certainly never confuse a TTS voice with that of a human being.
The Engine that Drives TTS
The first software component required for TTS to work is a TTS engine. Microsoft’s standard speech protocol, Speech API (SAPI), lets TTS applications use any compatible speech engine. The speech engine provides the basic programming interface. A TTS application processes the text and then passes it over to the TTS engine, which produces the sounds. The quality of the speech you hear is mostly dependent on the engine rather than on the TTS software.
A TTS engine usually includes one or more voice files. The voice files determine how the speech will sound. Each voice can mimic and produce a certain gender or speech pattern—male or female, young or old, high or low. You can sometimes choose from a variety of voices within a single speech engine. You can alter most voices by modifying key attributes, such as the pitch, tone, and volume.
A few select companies make TTS engines, including IBM, Lernout & Hauspie (L&H), and Microsoft. The two most popular ones are the Microsoft and L&H engines. Most TTS applications use one or the other, or both, though some use their own proprietary engines and voices. The Microsoft TTS engine includes three voices, called Mary, Mike, and Sam. Both the Microsoft and L&H engines offer voices in American English, while L&H also makes voices for British English, Spanish, French, and Italian, among others. The engine is typically installed automatically when you install the TTS application. You can download both engines for free from the Web sites of many TTS software vendors.
For years the Holy Grail of TTS technology has been to reproduce the sound of human speech. The quality of the speech depends largely on the type of TTS synthesis used. TTS technology uses two methods to synthesize speech: formant and concatenative. Formant synthesis uses algorithms to reproduce or mimic human speech, which leads to the robotic voice you hear from most TTS applications. Concatenative synthesis uses prerecorded bits of actual human speech to form the words. This results in a more natural, human-sounding voice.
Some TTS engines and applications are starting to incorporate concatenative synthesis to create more natural speech. A broader TTS technology to reach this goal is Natural Voices from AT&T Labs, which uses a sophisticated concatenative synthesis to record and enhance actual human speech. When you hear the TTS speech, you’re listening to the prerecorded words broken down into their phonemes. In the past, Natural Voices was available only for larger business systems, such as call centers, but AT&T has created a Natural Voices TTS engine that can be used by desktop application vendors. A demo can be found at www.naturalvoices.att.com. (For an in-depth look at Natural Voices, see the sidebar AT&T Labs’ Natural Voices on this page.)
We looked at seven different Windows-based TTS applications. Most Windows TTS applications work the same way: open your document, select and copy the text you wish to hear, and the software reads it aloud to you.
To test each product, we had them read the same documents, e-mail, and Web pages. We created a test document to gauge each application’s capabilities with words, numbers, and other parts of speech. We intentionally placed complex words in the document to see how each program handled them.
Overall, the products were more alike than different. The distinguishing factor, at least in voice quality, proved not to be the application, but the TTS engine. Most TTS products featured here use either the Microsoft or L&H TTS engine and voices. Two of the products use their own engine and voices, which sounded more natural than any of the voices from Microsoft or L&H
ByteCool CoolSpeech 4.2 $29
CoolSpeech is an effective TTS program with an interesting mix of features. It’s simple to use: just open your document, select and copy the text you wish to hear, and the software reads it aloud. The CoolSpeech window can display your text as it’s being read. A toolbar provides buttons to play, pause, stop, rewind, and forward through your selected text. The CoolSpeech window can be minimized to a system tray icon so it’s out of sight as your text is being read aloud.
You can save any spoken text as a separate file in text only (TXT), Rich Text Format (RTF), or WAV format. You can open TXT, RTF, and even HTML files directly within CoolSpeech to have them read aloud.
CoolSpeech installs the L&H TruVoice American English TTS engine, which comes with one default voice—American English male. You can install and add additional voices. We were able to install the L&H British English male and female voices and the Microsoft TTS engine with its three voices. These additional engines and voices can be downloaded for free from the CoolSpeech Web site.
You can change certain speech attributes of some voices. We were able to adjust the L&H voices for pitch, rate, and a few other factors, while the Microsoft voices could be adjusted for speed and pitch. Mispronounced words can be added to CoolSpeech’s built-in dictionary. You build the word by assembling it with its proper phonemes, an accurate way of zeroing in on the correct pronunciation.
CoolSpeech can automatically download and read your e-mail at specific intervals, which you schedule through the software. One of the handiest features of CoolSpeech is its ability to read news and other content from Web sites. The software offers access to several default sites, including AOL News, ESPN News, and ZDNet News, but you can add additional sites. CoolSpeech downloads the HTML source code from the Web site you specify, and extracts and reads the plain text. The site you choose doesn’t need to be compatible with CoolSpeech or TTS technology. As with e-mail, you can schedule CoolSpeech to automatically download the content of your selected Web sites. We tried this feature with a variety of different Web sites, and it worked quite well.
A demo version of CoolSpeech is available for download at the company Web site. It provides you with 14 days of unlimited use.
Fonix iSpeak 2.0 $69
iSpeak offered the most natural-sounding voices of all the TTS products here. Only two voices are included—a male named Roger and a female named Jessica, but they’re enough. Both voices offer two levels of quality: standard and high. The standard voices sound slightly better than the typical synthesized TTS voice, but the high quality voices almost achieve the sound of human speech. Though they sound somewhat mechanical, the high-quality voices have a smoother, lifelike quality that makes them more pleasing to hear.
To achieve the high voice quality, iSpeak utilizes its own proprietary TTS engine, called Fonix Text-to-Speak. The Fonix Text-to-Speak uses concatenative synthesis, which analyzes the text to be read and selects from a bank of prerecorded human voices to recreate the audio. These prerecorded sounds range from entire words and phrases to basic phonemes. The standard-quality voices take up a few megabytes of disk space each, while the high-quality voices use about 100MB. To install the full program and all voices, iSpeak requires almost 300MB of space.
The iSpeak window automatically displays your selected text as it’s being read. You can stop, pause, rewind, and forward through the words and save your text as separate TXT, WAV, or MP3 files. iSpeak includes a powerful dictionary that lets you add any mispronounced word by building it from a list of proper phonemes. You can even add accents to the word so that iSpeak learns which syllables to emphasize.
Mindmaker TextAssist 4.0 $44
TextAssist is another solid TTS application with superior voice quality. Like iSpeak, TextAssist is a self-contained program. Mindmaker doesn’t rely on third-party engines or voices, using its own FlexVoice TTS engine with an array of unique voices. The FlexVoice engine is a hybrid technology that uses both formant and concatenative synthesis techniques. It sounds better than straight formant synthesis without requiring too much memory or disk space.
TextAssist was the easiest TTS product to use. Just open the document or file you wish to hear, and TextAssist can automatically read it from start to finish. The software smartly detects whatever application is running in the foreground, whether it’s a word processor, spreadsheet, Web browser, or e-mail client. You can copy and select specific text if you don’t want to hear the entire document.
Mindmaker supplies nine male and female voices with TextAssist, each of which can be altered by volume, pitch, tone, richness, smoothness, and intonation. You can associate a different voice for each application: program Julie to read Microsoft Word documents, Tom to read Excel spreadsheets, and Bill to read Web pages in Internet Explorer. A dictionary is provided to handle mispronounced words. You add the word by choosing its correct phonemes from a list so TextAssist knows how to pronounce it the next time around.
TextAssist provided the second-best voices, behind iSpeak, but, since it’s less memory- and disk-intensive, the software strikes a good balance. TextAssist 4.0 is available at Mindmaker’s Web site for $44 as a download. A CD-ROM version can be ordered for $49.
MoneyTree Software ReadPlease Plus 2002 $49
ReadPlease Plus 2002 uses Microsoft’s TTS engine. This program provides three voices, and each can be adjusted by speed and pitch.
ReadPlease works like most TTS applications, by reading selected text. Your text appears in the ReadPlease window as it’s being read, and you can pause, stop, and go back and forth through it. You can minimize ReadPlease to a system tray icon so it doesn’t take up screen space as your text is being read. Any text that appears in the window can be saved as a TXT file or a proprietary ReadPlease file.
The dictionary in ReadPlease is very effective at handling mispronounced words. You can add a word by spelling it or by choosing its phonemes from the dictionary’s list.
MoneyTree Software provides a free version called ReadPlease 2002. The free version lacks certain key features of ReadPlease Plus, such as the dictionary. Both versions of ReadPlease are available for download from the company’s Web site.
Shadisoft Speak & Mail $29
Speak & Mail is one of several TTS applications from Shadisoft with some interesting options. Speak & Mail reads your text via a Microsoft Agent cartoon character. This character can automatically announce the date and time, tell a joke, or simply read your text. You can download several additional characters from the Speak & Mail Web site.
Speak & Mail reads any text you select, and can open and read a text file directly. The software uses the L&H TTS engine and voices, and throws in a few of its own voices.
The program is named Speak & Mail because of the way it integrates with your e-mail account. It can automatically download and read your e-mail at any time you schedule. Speak & Mail can even serve as your own personal voice calendar. You can schedule the software to announce any appointment or other item at a specific time.
There’s no built-in dictionary for correcting mispronounced words, but you can download and install a free Microsoft Speech Control Panel with its own dictionary that lets you spell the correct word or build it using its proper phonemes.
Speak & Mail dabbles in the area of voice dictation. Using a microphone you can ask questions of it, such as “What is the time?” or “What is the date?,” and Speak & Mail will answer you accordingly.
You can download a trial version from the Shadisoft Web site.
textHELP! ScreenReader $29
ScreenReader uses both the L&H and Microsoft TTS engines and voices. It appears as a small window that stays active on your screen and reads whatever text you copy to the clipboard. By default, the software displays a cartoon character and voice balloon that moves as the text is read. Thankfully, you can turn these options off. ScreenReader provides toolbar buttons so you can stop, pause, and go backward or forward through the text.
ScreenReader includes a pronunciation tool for changing the pronunciation of a word using the phonemes. There’s no option to build a word by selecting from a list of phonemes. Your only choice is to type the right phonemes on your own, which proved a difficult guessing game with certain words. ScreenReader is a simple, basic TTS product with enough options to make it worth considering. You can buy it directly from textHELP!’s Web site; it’s $29 for the downloaded version or $44 for the shipped version.
textHELP! makes other notable TTS products. Read & Write is a toolbar TTS application that works with your word processor, spreadsheet, Web browser, and other Windows applications. It can speak text that you’ve selected from a document or as you type it. It can read aloud any menus or icons that you click on. Read & Write provides sophisticated audible spell checking and a comprehensive thesaurus. It costs $249 for a single license.
Type & Talk is a word processor for people with reading and writing difficulties. Type & Talk can read text aloud as you type it and suggest corrections for misspelled words. You can use it as a traditional TTS application by selecting the specific text you wish to hear. The software comes with several default voices, and you can use the voices from the Microsoft and L&H TTS engines, as well. If you’re looking for a good combination word processor/TTS application or a strong tool to help someone learn English, Type & Talk is quite effective. The software costs $169 for a single-user license.
YesGoal CoolSpeaking $39
CoolSpeaking proved to be another reliable TTS application. It uses the L&H TTS engine with the American English male voice. You can install the L&H British English voice and other languages, as well as the Microsoft TTS engine.
Select and copy the text you wish to hear, and CoolSpeaking will read it to you. It can also read each word or sentence as you type it. You can save your text passages as separate TXT files or convert them into WAV files for playback. CoolSpeaking provides a few other handy features, such as the ability to announce the date and time.
You’ll find the usual interface with a window that can display text as it’s being read. The Control Panel lets you navigate through your spoken text. You can minimize CoolSpeaking so that it shrinks to the Windows system tray. From there it can read the contents of the clipboard by choosing commands from a pop up menu.
A demo version of CoolSpeaking is now available as a free download at http://www.yesgoal.com
AT&T Labs, a division of AT&T Corp.
MoneyTree Software Co.
textHelp! Systems, Inc.
Copyright © Bedford Communications, Inc. 2000-2003