Convert Text-to-Speech in Java

Text-to-speech (TTS) or read-aloud is a type of assistive technology (it is a term for assistive, adaptive, and rehabilitative devices for people with disabilities) that reads digital text audibly. Converting text-to-speech (TTS) is an advanced functionality of smart devices like ATMs, online translators, text scanners, etc. Implementing text-to-speech technology in the application enhances the customer experience because of relevant accessibility. Nowadays, it is widely using in to make books audible. Even a popular platform named Audible providing thousands of books in audio form by using the TTS technology. Most of the smart devices are coming with this feature.

In this section, we will discuss Java Speech API FreeTTS and how can we convert text-to-speech using the Java program.

Convert Text-to-Speech in Java

Java Speech API (JSAPI)

Java provides the Speech API that incorporates speech technology in UI. It defines a cross-platform API to support command and control recognizers, dictation systems, and speech synthesizers. It is not a part of JDK. It is a third-party speech API to encourage the availability of multiple implementations. The architecture of the TTS system is shown in the following figure.

Convert Text-to-Speech in Java

JSAPI includes the two specifications i.e. JSML (Java Speech API Markup Language) and JSGF (Java Speech API Grammar Format). JSML defines the standard text format for marking up text for input to a speech synthesizer. While the JSGF defines the standard text format for providing grammar to a speech recognizer. The following figure illustrates the block diagram of text-to-speech.

Convert Text-to-Speech in Java

There are four things that are required for JSAPI to convert text to speech.

Engine

It is a parent interface for all speech engines that is defined in the javax.speech package. The speech engine includes Recognizer and a synthesizer. Therefore, it deals with both the speech input and speech output.

The createRecognizer() and createSynthesizer() methods are used to create speech engines. Both methods accept a single parameter EngineModeDesc that defines the required properties for the engine to be created.

The parameter may be one of the subclasses i.e. RecognizerModeDesc or SynthesizerModeDesc.

A mode descriptor defines a set of required properties for an engine. For example, a SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice. Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for Japanese.

Central

It is a class that belongs to javax.speech package. It is the initial access point for all speech input and output proficiencies. It provides the ability to locate, select, and create speech recognizers and speech synthesizers.

SynthesizerModeDesc

It extends the EngineModeDesc with the properties that are specific to speech synthesizers.

Synthesizer

It is also an interface that provides primary access to speech synthesis capabilities. SynthesizerModeDesc adds two properties: List of voices provided by the synthesizer Voice to be loaded when the synthesizer is started.

Third-Party Speech API

Java provides the following third-party Speech API that can be used to convert text to speech.

  1. FreeTTS
  2. IBM's Speech for Java
  3. The Cloud Garden
  4. Conversa Web 3.0
  5. Festival

In this section, we will discuss the widely used speech synthesis API called FreeTTS.

FreeTTS

FreeTTS is an open-source speech synthesis system that is written entirely in Java programming language. It is based on festival-lite also known as CMU Flite. It is a small, fast run-time open source text to speech synthesis engine. By using the FreeTTS API, we can make our computer speak. In other words, we can say that it is an artificial production of human speech that converts a normal text to speech.

In order to create a Java program, first, we need to download and install FreeTTS API. Follow the steps given below.

Step 1: Download the FreeTTS API in zip form.

Step 2: Extract the zip file that provides two folders, as we have shown in the following image.

Convert Text-to-Speech in Java

Step 3: Access the directory C:\freetts-1.2.2-bin\freetts-1.2\lib\jsapi.exe

Convert Text-to-Speech in Java

Step 4: Install the jsapi by double-clicking on the jsapi.exe file. Accept the License Agreement by clicking on the I Agree button.

Convert Text-to-Speech in Java

Now click on the Close button. The above process generates a jar file (in the same location where the jsapi.exe file resides) named jsapi.jar. It is a jar file that contains the FreeTTS library that is required to create a text-to-speech application.

Convert Text-to-Speech in Java

We have installed JSAPI properly.

Step 5: Now, we will create a Java project in IDE as usually we create. In our case, we have created a Java project with the name TTS. In this project, we have created a class name TextToSpeechExample1 and write the following code.

Convert Text-to-Speech in Java

Note: Before running the program, we must ensure that the following jar files are included in our project.

Convert Text-to-Speech in Java

Step 6: Navigate the directory C:\freetts-1.2.2-bin\freetts-1.2 and copy the speech.properties file and paste the properties file into the home directory. In our case the directory is C:\Users\Anubhav.

Convert Text-to-Speech in Java

Let's create a Java program that converts text-to-speech.

Text-to-Speech Java Program

TextToSpeechExample1.java

Now run the above program. The output of the program cannot be shown here because it is only audible. So, try it yourself.

TextToSpeechExample2.java

JSAPI also allows us to set rate, pitch, and volume of the voice by using the setRate(), setPitch(), and setVolume() methods, respectively. For example, consider the following Java program.

In the following program, note that instead of using the javax.speech package, we have used com.sun.speeach package.

TextToSpeechExample3.java

Note: The output of the above program is audible.


Next TopicJava Editors