Convert Text-to-Speech in Java

Text-to-speech (TTS) or read-aloud is a type of assistive technology (it is a term for assistive, adaptive, and rehabilitative devices for people with disabilities) that reads digital text audibly. Converting text-to-speech (TTS) is an advanced functionality of smart devices like ATMs, online translators, text scanners, etc. Implementing text-to-speech technology in the application enhances the customer experience because of relevant accessibility. Nowadays, it is widely using in to make books audible. Even a popular platform named Audible providing thousands of books in audio form by using the TTS technology. Most of the smart devices are coming with this feature.

In this section, we will discuss Java Speech API FreeTTS and how can we convert text-to-speech using the Java program.

Java Speech API (JSAPI)

Java provides the Speech API that incorporates speech technology in UI. It defines a cross-platform API to support command and control recognizers, dictation systems, and speech synthesizers. It is not a part of JDK. It is a third-party speech API to encourage the availability of multiple implementations. The architecture of the TTS system is shown in the following figure.

JSAPI includes the two specifications i.e. JSML (Java Speech API Markup Language) and JSGF (Java Speech API Grammar Format). JSML defines the standard text format for marking up text for input to a speech synthesizer. While the JSGF defines the standard text format for providing grammar to a speech recognizer. The following figure illustrates the block diagram of text-to-speech.

There are four things that are required for JSAPI to convert text to speech.

Engine

It is a parent interface for all speech engines that is defined in the javax.speech package. The speech engine includes Recognizer and a synthesizer. Therefore, it deals with both the speech input and speech output.

The createRecognizer() and createSynthesizer() methods are used to create speech engines. Both methods accept a single parameter EngineModeDesc that defines the required properties for the engine to be created.

The parameter may be one of the subclasses i.e. RecognizerModeDesc or SynthesizerModeDesc.

A mode descriptor defines a set of required properties for an engine. For example, a SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice. Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for Japanese.

Central

It is a class that belongs to javax.speech package. It is the initial access point for all speech input and output proficiencies. It provides the ability to locate, select, and create speech recognizers and speech synthesizers.

SynthesizerModeDesc

It extends the EngineModeDesc with the properties that are specific to speech synthesizers.

Synthesizer

It is also an interface that provides primary access to speech synthesis capabilities. SynthesizerModeDesc adds two properties: List of voices provided by the synthesizer Voice to be loaded when the synthesizer is started.

Third-Party Speech API

Java provides the following third-party Speech API that can be used to convert text to speech.

FreeTTS
IBM's Speech for Java
The Cloud Garden
Conversa Web 3.0
Festival

In this section, we will discuss the widely used speech synthesis API called FreeTTS.

FreeTTS

FreeTTS is an open-source speech synthesis system that is written entirely in Java programming language. It is based on festival-lite also known as CMU Flite. It is a small, fast run-time open source text to speech synthesis engine. By using the FreeTTS API, we can make our computer speak. In other words, we can say that it is an artificial production of human speech that converts a normal text to speech.

In order to create a Java program, first, we need to download and install FreeTTS API. Follow the steps given below.

Step 1: Download the FreeTTS API in zip form.

Step 2: Extract the zip file that provides two folders, as we have shown in the following image.

Step 3: Access the directory C:\freetts-1.2.2-bin\freetts-1.2\lib\jsapi.exe

Step 4: Install the jsapi by double-clicking on the jsapi.exe file. Accept the License Agreement by clicking on the I Agree button.

Now click on the Close button. The above process generates a jar file (in the same location where the jsapi.exe file resides) named jsapi.jar. It is a jar file that contains the FreeTTS library that is required to create a text-to-speech application.

We have installed JSAPI properly.

Step 5: Now, we will create a Java project in IDE as usually we create. In our case, we have created a Java project with the name TTS. In this project, we have created a class name TextToSpeechExample1 and write the following code.

Note: Before running the program, we must ensure that the following jar files are included in our project.

Step 6: Navigate the directory C:\freetts-1.2.2-bin\freetts-1.2 and copy the speech.properties file and paste the properties file into the home directory. In our case the directory is C:\Users\Anubhav.

Let's create a Java program that converts text-to-speech.

Text-to-Speech Java Program

TextToSpeechExample1.java

import java.util.Locale;
import javax.speech.Central;
import javax.speech.synthesis.Synthesizer;
import javax.speech.synthesis.SynthesizerModeDesc;
public class TextToSpeechExample1 
{
public static void main(String args[])
{
try 
{
//setting properties as Kevin Dictionary
System.setProperty("freetts.voices", "com.sun.speech.freetts.en.us" + ".cmu_us_kal.KevinVoiceDirectory");
//registering speech engine
Central.registerEngineCentral("com.sun.speech.freetts" + ".jsapi.FreeTTSEngineCentral");
//create a Synthesizer that generates voice
Synthesizer synthesizer = Central.createSynthesizer(new SynthesizerModeDesc(Locale.US));
//allocates a synthesizer
synthesizer.allocate();
//resume a Synthesizer
synthesizer.resume();
//speak the specified text until the QUEUE become empty
synthesizer.speakPlainText("GeeksforGeeks", null);
synthesizer.waitEngineState(Synthesizer.QUEUE_EMPTY);
//deallocating the Synthesizer
synthesizer.deallocate();
}
catch (Exception e) 
{
e.printStackTrace();
}
}
}

Now run the above program. The output of the program cannot be shown here because it is only audible. So, try it yourself.

TextToSpeechExample2.java

import javax.speech.*;    
import java.util.*;    
import javax.speech.synthesis.*;    
public class TextToSpeechExample2    
{    
//text to listen
String speaktext; 
//function that makes text audible
public void dospeak(String speak, String voicename)    
{    
//assigning text to speak variable
speaktext=speak;    
String voiceName =voicename;    
try    
{    
//the SynthesizerModeDesc class inherits the EngineModeDesc with properties
//it inherits the engine name, mode name, locale, and running properties 
SynthesizerModeDesc desc = new SynthesizerModeDesc(null, "general",  Locale.US, null, null);    
//Synthesizer interface generates sound and the createSynthesizer() method creates the Synthesizer
Synthesizer synthesizer =  Central.createSynthesizer(desc);    
//allocates a Synthesizer
synthesizer.allocate();    
//resumes a Synthesizer
synthesizer.resume();     
desc = (SynthesizerModeDesc)  synthesizer.getEngineModeDesc();     
Voice[] voices = desc.getVoices();      
Voice voice = null;
//loop iterates over the voice until the condition becomes false
for (int i = 0; i < voices.length; i++)    
{    
if (voices[i].getName().equals(voiceName))    
{    
voice = voices[i];    
break;     
}     
}    
synthesizer.getSynthesizerProperties().setVoice(voice);    
System.out.print("Speaking: "+speaktext);    
synthesizer.speakPlainText(speaktext, null);    
synthesizer.waitEngineState(Synthesizer.QUEUE_EMPTY);    
synthesizer.deallocate();    
}    
catch (Exception e)   
{    
String message = " missing speech.properties in " + System.getProperty("user.home") + "\n";    
System.out.println(""+e);    
System.out.println(message);    
}    
}    
public static void main(String args[])    
{    
TextToSpeechExample2 obj=new TextToSpeechExample2(); 
obj.dospeak("Don't limit yourself. Many people limit themselves to what they think they can do. You can go as far as your mind lets you. What you believe, remember, you can achieve.", "kevin16");    
}    
}

JSAPI also allows us to set rate, pitch, and volume of the voice by using the setRate(), setPitch(), and setVolume() methods, respectively. For example, consider the following Java program.

In the following program, note that instead of using the javax.speech package, we have used com.sun.speeach package.

TextToSpeechExample3.java

import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;
public class TextToSpeechExample3
{
public static void main(String args[]) 
{
//creating an object of the Voice class
Voice voice;
//getting voice, here we have used kevin (male version) voice
voice = VoiceManager.getInstance().getVoice("kevin");
if (voice != null) 
{
//the Voice class allocate() method allocates this voice
voice.allocate();
}
try 
{
//sets the rate (words per minute i.e. 190) of the speech
voice.setRate(190);
//sets the baseline pitch (150) in hertz 
voice.setPitch(150);
//sets the volume (10) of the voice 
voice.setVolume(10); 
//the speak() method speaks the specified text
voice.speak("Don't limit yourself. Many people limit themselves to what they think they can do. You can go as far as your mind lets you. What you believe, remember, you can achieve.");
}
catch(Exception e)
{
e.printStackTrace();
}
}
}