Processing Speech in Java

Java programming language enables conversion of text to human recognizable speech using the inbuilt interfaces of Java Speech API. It is used to enhance the user experience and comfortability. The API defines a cross-platform API to support command and control recognizers and speech synthesizers. Text - to - speech (TTS) or read - aloud is an assistive technology that enables the digital text to be audible to the users. Assistive technology is a technology for assistive, adaptive, and rehabilitative devices built for the people with disabilities.

Nowadays days processing of speech is widely used in various application and kiosks. One such example is the text to speech accessibility option in the smartphones and various apps such as Domino's that reads out the options/menus for the users.

Let's understand Java Speech API in details and how we can convert the text into speech.

Convert Text - to - Speech in Java

Java Speech API (JSAPI)

The Java Speech API allows the Java applications to enable the speech technology in the user interfaces. The command-and-control recognizers, dictation systems, and speech synthesizers are supported by the cross-platform API defined by Java Speech API. It is not contained in the Java Development Kit and therefore we need a third-party speech API to encourage the multiple implementations to be available. Java Speech is only a specification, it has no implementation of its own.

In this section, we will be using the open-source implementation from FreeTTS but there are other implementations also such as Cloudscape.

Consider the following classes associated with FreeTTS that can be used to convert text to speech.

javax.speech.Central Class

It is a singleton class contained inside the "javax.speech" package. It is the main interface to access the speech engine facilities. It is the first access point for all speech and output proficiencies. The methods such as availableSynthesizers and createSynthesizer are a part of the class only. It provides the ability to detect, select, and create speech recognition and speech synthesizers.

javax.speech.synthesis.SynthesiserModeDesc Class

The class holds all the required properties of the Synthesizer. The list of properties includes the engine name, mode name, locale and running synthesizer.

Engine name is used to refer to the name of the engine used in the program. The mode name property is engine-specific and restricts the synthesizer to those that can speak the text. The locale property is used to restrict the international synthesizers. Lastly, the running synthesizer property is used to limit the synthesizers returned to only those that are already loaded into memory.

Engine: It is defined inside the javax.speech package and is considered as the parent interface for all the other speech engines. It includes Recognizer and a synthesizer. Therefore, the speech input and speech output are easily performed.

The methods used to create speech engines are createRecognizer( ) and createSynthesizer( ). Both of these methods accept only a single parameter EngineModeDesc that defines all those properties that are required for the creation of the engine. One of the subclasses such as RecognizerModeDesc or SynthesizerModeDesc are passed as the parameter.

The role of mode descriptor defines the set of all the required properties for an engine. For example, a SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice. Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for Japanese.

javaxspeech.synthesis.Synthesizer Class

It is also defined as an interface that provides speech synthesis capabilities a primary access. The synthesisers must be first allocated before they are used anywhere. SynthesizerModeDesc adds the following two properties, first one is the List of voices provided by the synthesizer and another one is the Voice to be loaded when the synthesizer is started.

Third-Party Speech API

The following third-party Speech APIs are provided by the Java programming language to convert text to speech.

FreeTTS
IBM's Speech for Java
The Cloud Garden
Conversa Web 3.0
Festival

Let's discuss above mentioned library in detail.

FreeTTS

FreeTTS is an open-source compilation system that is fully written in Java programming language. It is a small, fast run-time open-source text to speech synthesis engine. The computer can actually speak when the FreeTTS API is used. In the la man language, it is simply an artificial production of human speech that converts a normal text to speech.

In order to implement the Speech Synthesis in Java follow the steps given below.

Download the FreeTTS in the form of zip folder from here
Extract the zip file and select freetts-1.2.2-bin/freetts-1.2/lib/jsapi.exe
Open the jsapi.exe file and install it.
A jar file will be created with the name " jsapi.jar ". The FreeTTS library is contained inside this JAR library that you include in the project.
Create a new Java project in your IDE.
Include this jsapi.jar file into your project.
Code the project as per your requirement.
Finally, execute the project to obtain the desired output.

The packages popular for text to speech conversion in Java are as follows:

1. Package javax.speech

The "javax.speech" package defines all the classes and interfaces that define the basic functionality of an engine. Speech synthesizers and speech recognizers are both speech engine instances. The "javax.speech.synthesis" and "javax.speech.recognition" packages extend the basic functionality and specific capabilities of speech synthesizers and speech recognizers.

Let's have a look at the basic processes for using speech engine in an application:

Identify the functional requirement of the application for an engine. For example, the language to be used.
Locate and create an engine that stands good on the above requirements.
Allocate the resources for the engine chosen.
Begin working on the operations of the engine.
Deallocate the resources of the engine once you are done.

Consider the following Java program that converts text - to - speech.

TextToSpeechExample2.java

// importing the javax.seech package and its libraries
import javax.speech.* ;      
import java.util.* ;      
import javax.speech.synthesis.* ;      
public class TextToSpeechExample2  
{      
    // variable to hold the text that will be audible
    String audible ;   
    
    // function that makes text audible  
    public void audible( String speak, String voicename )      
    {      
    // assigning the user entered text in the variable defined 
    audible = speak ;      
    String voiceName = voicename ;      
    
    try      
    {      
    // setting the properties( engine name, mode name, locale and running properties ) of SynthesizerModeDesc
    SynthesizerModeDesc synth = new SynthesizerModeDesc( null, " general ", Locale.US, null, null ) ;      
    // Synthesizer interface generates sound and the createSynthesizer( ) method creates the Synthesizer  
    Synthesizer synthesizer = Central.createSynthesizer( synth ) ;     
    // allocating the Synthesizer  
    synthesizer.allocate( ) ;      
    // woking on the operations of the engine 
    synthesizer.resume( ) ;       
    synth = ( SynthesizerModeDesc ) 
    synthesizer.getEngineModeDesc( ) ;      
    Voice[ ] voices = synth.getVoices( ) ;        
    Voice voice = null ;  
    
    // loop to iterate over the voice until the condition becomes false  
    for ( int i = 0 ; i < voices.length ; i++ )     
    {      
    if ( voices[ i ].getName( ).equals( voiceName ) )     
    {      
    voice = voices[ i ] ;      
    break ;       
    }       
}      
 synthesizer.getSynthesizerProperties( ).setVoice( voice ) ;      
 System.out.print( " Speaking : " + speaktext ) ;      
 synthesizer.speakPlainText( speaktext, null ) ;      
 synthesizer.waitEngineState( Synthesizer.QUEUE_EMPTY ) ;  

 // deallocating the resources of the engine
 synthesizer.deallocate( ) ;      
}      
catch ( Exception e )     
{      
String message = " speech.properties missing in " + System.getProperty( " user.home " ) + " \n " ;      
System.out.println( " " + e ) ;      
System.out.println( message ) ;      
}      
}      
public static void main( String args[ ] )      
{      
    // creating an object of the class TexttoSpeech
    TextToSpeech txt = new TextToSpeech( ) ;   
    txt.audible( " Learning one new idea won't make you a genius, but a commitment to lifelong learning can be tranformative. ", " kevin16 " ) ;     
}      
} 

To get the output, execute the program and listen the text that we have specified in the above program.

2. Package com.sun.speech package

The com.sun.speech package defines all the classes and interfaces that define the basic functionality of an engine. com.sun.speech.freetts contains the implementation of the FreeTTS synthesis engine. Most of the non-language and voice dependent code can be found here.

JSAPI also allows us to set rate, pitch, and volume of the voice by using the methods such as " setRate( ) ", " setPitch( ) ", and " setVolume( ) " methods, respectively. For example, consider the following Java program.

Voice

It is the central processing point for FreeTTS which takes as input a FreeTTSSpeakable and translates the text associated with it into speech and generates an audio corresponding to that speech. A voice will accept a FreeTTSSpeakable via the Voice.speak method.

VoiceManager

It is the central repository of voices available to FreeTTS. It is used to get a voice.

Consider the following Java program that imports the package com.sun.speech and uses the above methods :

TextToSpeech.java

// importing com.sun.speech package
import com.sun.speech.freetts.Voice ;  
import com.sun.speech.freetts.VoiceManager ;  
public class TextToSpeech 
{  
public static void main( String args[ ] )   
{  
    // creating an object of the Voice class  
    Voice voice ;  
    // getting voice, here we have used kevin ( male version ) voice  
    voice = VoiceManager.getInstance( ).getVoice( " kevin " ) ;  
    // checks if there is any information in the object voice 
    if ( voice != null )   
    {  
        // the Voice class allocate( ) method allocates this voice  
        voice.allocate( ) ;  
    }  
    try   
    {  
        // sets the rate ( words per minute i.e. 100 ) of the speech  
        voice.setRate( 100 ) ;  
        // sets the baseline pitch ( 150 ) in hertz   
        voice.setPitch( 150 ) ;  
        // sets the volume ( 10 ) of the voice   
        voice.setVolume( 8 ) ;   
        // the speak( ) method speaks the specified text 
        voice.speak( " Learning one new idea won't make you a genius, but a commitment to lifelong learning can be tranformative. " ) ;  
    }  
    catch( Exception e )  
    {  
        e.printStackTrace( ) ;  
    }  
}  
}  

To get the output, execute the program and listen the text that we have specified in the above program.

Next TopicJava Output Formatting

← prev next →