Convert Text-to-Speech in Java
Text-to-speech (TTS) or read-aloud is a type of assistive technology (it is a term for assistive, adaptive, and rehabilitative devices for people with disabilities) that reads digital text audibly. Converting text-to-speech (TTS) is an advanced functionality of smart devices like ATMs, online translators, text scanners, etc. Implementing text-to-speech technology in the application enhances the customer experience because of relevant accessibility. Nowadays, it is widely using in to make books audible. Even a popular platform named Audible providing thousands of books in audio form by using the TTS technology. Most of the smart devices are coming with this feature.
In this section, we will discuss Java Speech API FreeTTS and how can we convert text-to-speech using the Java program.
Java Speech API (JSAPI)
Java provides the Speech API that incorporates speech technology in UI. It defines a cross-platform API to support command and control recognizers, dictation systems, and speech synthesizers. It is not a part of JDK. It is a third-party speech API to encourage the availability of multiple implementations. The architecture of the TTS system is shown in the following figure.
JSAPI includes the two specifications i.e. JSML (Java Speech API Markup Language) and JSGF (Java Speech API Grammar Format). JSML defines the standard text format for marking up text for input to a speech synthesizer. While the JSGF defines the standard text format for providing grammar to a speech recognizer. The following figure illustrates the block diagram of text-to-speech.
There are four things that are required for JSAPI to convert text to speech.
It is a parent interface for all speech engines that is defined in the javax.speech package. The speech engine includes Recognizer and a synthesizer. Therefore, it deals with both the speech input and speech output.
The createRecognizer() and createSynthesizer() methods are used to create speech engines. Both methods accept a single parameter EngineModeDesc that defines the required properties for the engine to be created.
The parameter may be one of the subclasses i.e. RecognizerModeDesc or SynthesizerModeDesc.
A mode descriptor defines a set of required properties for an engine. For example, a SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice. Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for Japanese.
It is a class that belongs to javax.speech package. It is the initial access point for all speech input and output proficiencies. It provides the ability to locate, select, and create speech recognizers and speech synthesizers.
It extends the EngineModeDesc with the properties that are specific to speech synthesizers.
It is also an interface that provides primary access to speech synthesis capabilities. SynthesizerModeDesc adds two properties: List of voices provided by the synthesizer Voice to be loaded when the synthesizer is started.
Third-Party Speech API
Java provides the following third-party Speech API that can be used to convert text to speech.
In this section, we will discuss the widely used speech synthesis API called FreeTTS.
FreeTTS is an open-source speech synthesis system that is written entirely in Java programming language. It is based on festival-lite also known as CMU Flite. It is a small, fast run-time open source text to speech synthesis engine. By using the FreeTTS API, we can make our computer speak. In other words, we can say that it is an artificial production of human speech that converts a normal text to speech.
In order to create a Java program, first, we need to download and install FreeTTS API. Follow the steps given below.
Step 1: Download the FreeTTS API in zip form.
Step 2: Extract the zip file that provides two folders, as we have shown in the following image.
Step 3: Access the directory C:\freetts-1.2.2-bin\freetts-1.2\lib\jsapi.exe
Step 4: Install the jsapi by double-clicking on the jsapi.exe file. Accept the License Agreement by clicking on the I Agree button.
Now click on the Close button. The above process generates a jar file (in the same location where the jsapi.exe file resides) named jsapi.jar. It is a jar file that contains the FreeTTS library that is required to create a text-to-speech application.
We have installed JSAPI properly.
Step 5: Now, we will create a Java project in IDE as usually we create. In our case, we have created a Java project with the name TTS. In this project, we have created a class name TextToSpeechExample1 and write the following code.
Note: Before running the program, we must ensure that the following jar files are included in our project.
Step 6: Navigate the directory C:\freetts-1.2.2-bin\freetts-1.2 and copy the speech.properties file and paste the properties file into the home directory. In our case the directory is C:\Users\Anubhav.
Let's create a Java program that converts text-to-speech.
Text-to-Speech Java Program
Now run the above program. The output of the program cannot be shown here because it is only audible. So, try it yourself.
JSAPI also allows us to set rate, pitch, and volume of the voice by using the setRate(), setPitch(), and setVolume() methods, respectively. For example, consider the following Java program.
In the following program, note that instead of using the javax.speech package, we have used com.sun.speeach package.
Note: The output of the above program is audible.