Java String Encoding

In Java, when we deal with String sometimes it is required to encode a string in a specific character set. Encoding is a way to convert data from one format to another. String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified. There is only one way that can be used to get different encoding i.e. byte[] array. The way of encoding is not suitable if we get unexpected data. In this section, we will learn how to encode a string in Java.

Note: It is not possible to encode a string in UTF-8. So, use either ByteBuffer or call an array on it to get a byte[].

Before moving ahead in this section, we have to understand character encoding. Let's have a quick look. Let's understand why we need to encode a string.

Character encoding is a technique to convert text data into binary numbers. We can assign unique numeric values to specific characters and convert those numbers into binary language. These binary numbers later can be converted back to original characters based on their values.

Problem

Suppose, we have German string Tschüss and it is required to encode it. Consider the following code snippet:

If we encode the string by using the US_ASCII, it gives the Tsch?ss because the US_ASCII encoding does not understand the non-ASCII character (ü). When we convert an ASCII encoded string to UTF-8, we get the same string.

If a byte[] array contains non-Unicode text, we can convert the text into Unicode with String constructor. Conversely, we can also convert a String object into a byte[] array of non-Unicode characters with the String.getBytes() method. Let's encode the string by using the getBytes() method.

Using String.getBytes() Method

Java String class provides the getBytes() method that is used to encode s string into UTF-8. The method converts the string into a sequence of bytes and stores the result into an array.

Syntax:

It parses charsetName as a parameter and returns the byte array. It throws the UnsupportedEncodingException if the named charset is not supported.

Let's create a Java program that converts a string into UTF-8 encoding.

StringEncodingExample.java

Output:

Encoded String: 
71 111 111 103 108 101 32 67 108 111 117 100

Using StandardCharsets Class

We can also use the StandardCharset class to encode the string. There are two steps to encode the string. First, decode the string into bytes and then encode it into UTF-8. For example, consider the following code:

Another way to encode a string is to use the Base64 encoding. We will discuss the Base64 encoding and decoding in the coming section.