Javatpoint Logo
Javatpoint Logo

Processing Speech in Java

Java programming language enables conversion of text to human recognizable speech using the inbuilt interfaces of Java Speech API. It is used to enhance the user experience and comfortability. The API defines a cross-platform API to support command and control recognizers and speech synthesizers. Text - to - speech (TTS) or read - aloud is an assistive technology that enables the digital text to be audible to the users. Assistive technology is a technology for assistive, adaptive, and rehabilitative devices built for the people with disabilities.

Nowadays days processing of speech is widely used in various application and kiosks. One such example is the text to speech accessibility option in the smartphones and various apps such as Domino's that reads out the options/menus for the users.

Let's understand Java Speech API in details and how we can convert the text into speech.

Convert Text - to - Speech in Java

Java Speech API (JSAPI)

The Java Speech API allows the Java applications to enable the speech technology in the user interfaces. The command-and-control recognizers, dictation systems, and speech synthesizers are supported by the cross-platform API defined by Java Speech API. It is not contained in the Java Development Kit and therefore we need a third-party speech API to encourage the multiple implementations to be available. Java Speech is only a specification, it has no implementation of its own.

In this section, we will be using the open-source implementation from FreeTTS but there are other implementations also such as Cloudscape.

Consider the following classes associated with FreeTTS that can be used to convert text to speech.

javax.speech.Central Class

It is a singleton class contained inside the "javax.speech" package. It is the main interface to access the speech engine facilities. It is the first access point for all speech and output proficiencies. The methods such as availableSynthesizers and createSynthesizer are a part of the class only. It provides the ability to detect, select, and create speech recognition and speech synthesizers.

javax.speech.synthesis.SynthesiserModeDesc Class

The class holds all the required properties of the Synthesizer. The list of properties includes the engine name, mode name, locale and running synthesizer.

Engine name is used to refer to the name of the engine used in the program. The mode name property is engine-specific and restricts the synthesizer to those that can speak the text. The locale property is used to restrict the international synthesizers. Lastly, the running synthesizer property is used to limit the synthesizers returned to only those that are already loaded into memory.

Engine: It is defined inside the javax.speech package and is considered as the parent interface for all the other speech engines. It includes Recognizer and a synthesizer. Therefore, the speech input and speech output are easily performed.

The methods used to create speech engines are createRecognizer( ) and createSynthesizer( ). Both of these methods accept only a single parameter EngineModeDesc that defines all those properties that are required for the creation of the engine. One of the subclasses such as RecognizerModeDesc or SynthesizerModeDesc are passed as the parameter.

The role of mode descriptor defines the set of all the required properties for an engine. For example, a SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice. Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for Japanese.

javaxspeech.synthesis.Synthesizer Class

It is also defined as an interface that provides speech synthesis capabilities a primary access. The synthesisers must be first allocated before they are used anywhere. SynthesizerModeDesc adds the following two properties, first one is the List of voices provided by the synthesizer and another one is the Voice to be loaded when the synthesizer is started.

Third-Party Speech API

The following third-party Speech APIs are provided by the Java programming language to convert text to speech.

  1. FreeTTS
  2. IBM's Speech for Java
  3. The Cloud Garden
  4. Conversa Web 3.0
  5. Festival

Let's discuss above mentioned library in detail.


FreeTTS is an open-source compilation system that is fully written in Java programming language. It is a small, fast run-time open-source text to speech synthesis engine. The computer can actually speak when the FreeTTS API is used. In the la man language, it is simply an artificial production of human speech that converts a normal text to speech.

In order to implement the Speech Synthesis in Java follow the steps given below.

  1. Download the FreeTTS in the form of zip folder from here
  2. Extract the zip file and select freetts-1.2.2-bin/freetts-1.2/lib/jsapi.exe
  3. Open the jsapi.exe file and install it.
  4. A jar file will be created with the name " jsapi.jar ". The FreeTTS library is contained inside this JAR library that you include in the project.
  5. Create a new Java project in your IDE.
  6. Include this jsapi.jar file into your project.
  7. Code the project as per your requirement.
  8. Finally, execute the project to obtain the desired output.

The packages popular for text to speech conversion in Java are as follows:

1. Package javax.speech

The "javax.speech" package defines all the classes and interfaces that define the basic functionality of an engine. Speech synthesizers and speech recognizers are both speech engine instances. The "javax.speech.synthesis" and "javax.speech.recognition" packages extend the basic functionality and specific capabilities of speech synthesizers and speech recognizers.

Let's have a look at the basic processes for using speech engine in an application:

  1. Identify the functional requirement of the application for an engine. For example, the language to be used.
  2. Locate and create an engine that stands good on the above requirements.
  3. Allocate the resources for the engine chosen.
  4. Begin working on the operations of the engine.
  5. Deallocate the resources of the engine once you are done.

Consider the following Java program that converts text - to - speech.

To get the output, execute the program and listen the text that we have specified in the above program.

2. Package com.sun.speech package

The com.sun.speech package defines all the classes and interfaces that define the basic functionality of an engine. com.sun.speech.freetts contains the implementation of the FreeTTS synthesis engine. Most of the non-language and voice dependent code can be found here.

JSAPI also allows us to set rate, pitch, and volume of the voice by using the methods such as " setRate( ) ", " setPitch( ) ", and " setVolume( ) " methods, respectively. For example, consider the following Java program.


It is the central processing point for FreeTTS which takes as input a FreeTTSSpeakable and translates the text associated with it into speech and generates an audio corresponding to that speech. A voice will accept a FreeTTSSpeakable via the Voice.speak method.


It is the central repository of voices available to FreeTTS. It is used to get a voice.

Consider the following Java program that imports the package com.sun.speech and uses the above methods :

To get the output, execute the program and listen the text that we have specified in the above program.

Youtube For Videos Join Our Youtube Channel: Join Now


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Trending Technologies

B.Tech / MCA