UTF in Java

In this article on Java programming Language, we are going to see an elaborate explanation of the term " UTF " and its conversion. We are going to learn the different forms of UTF in Java Programming Language, their uses, and their nature while coding.

What is a Unicode?

Unicode is an international encoding standard which can be used with various languages and scripts. In Unicode, each symbol, letter or digit is assigned to a specific number. This numeric value can be applied across different platforms, hence the name Unicode.

What Does UTF mean?

The term UTF stands for " Unicode Transformation Format ". The Unicode Transformation Format is a structure to encode characters into Unicode. There exist multiple versions of the UTF, from which the " UTF-8 " version stands out in the most prolific way. The UTF-8 is a variable-length encoder and uses 8-bit code units while encoding. The UTF-8 version is designed for backwards compatibility with ASCII encoding. The number of blocks used to represent a character varies from one to four in the Unicode Transformation Format. The different UTF encodings that are used include,

  1. UTF-1: The first of the Unicode Transformation Formats. It is no longer a part of the Unicode standard.
  2. UTF-7: Uses 7-bits for the encoding process. It is the format which is primarily used in the mailing software "email".
  3. UTF-8: It is the most used format in the present times. The UTF-8 uses 8-bits to encode with variable width.
  4. UTF-16: Uses the 16-bit variable-width encoding format.
  5. UTF-32: Uses 32-bits for encoding, but the width is fixed, i.e. not variable width.
  6. UTF-EBCIDC: This format uses only 8-bits for encoding. It is designed to be compatible with the Extended Binary Coded Decimal Interchange Code ( EBCDIC ).

To convert a Unicode to UTF-8 in the Java Programming Language, we make use of a method called " getBytes() ". The getBytes() method will encode a string into a sequence of bytes to return a byte array as output.

Declaration of getBytes() method

Now let us see the use of the above declaration in the form of a program with the help of an example in Java Programming Language.

Program to Convert Unicode to UTF-8 in Java

Output:

The UTF-8 form for \u1111 is :
-31-124-111
 The UTF-8 form for \uFFFF is :
-17-65-65

In the output, we can notice that the UTF-8 form of the Unicode " 1111 " is given as " -31-124-111 " and for " FFFF ", it is given as " -17-65-65 ". So, in this way, the UTF is used in Java to convert any given Unicode. This conversion is done with the help of " getBytes( ) " method.

Precisely, to convert the Unicode into UTF-8 form, we have used the " getBytes ( " UTF-8 " ) " method. This method converted the input string into an array of bytes. The byte array produces the converted UTF form of the Unicode, and it is printed using an enhanced " for " loop.

Conclusion:

From the article, we can conclude that the " UTF " is a term used in the Java Programming Language to convert any Unicode into an encoded format. We have learned that the Unified Transformation Format has different versions, and they are being updated for the time being. The program and its output shown in the article make it clear that different Unicode have unique encoded UTFs. Hence, this is the complete information on the Unified Transformation Format ( UTF ) and its implementation in the Java Programming Language.






Latest Courses