# Hashing Algorithm in Java

An algorithm that does the mapping of data to a hash of fixed size is called the hashing algorithm. Hashing algorithm in Java is a cryptographic hash function. A hash algorithm or hash function is designed in such a way that it behaves like a one-way function. One way means it is not possible to do the inversion, i.e., retrieving the original value from the hash is not possible.

## Characteristic of a Hashing Algorithm

A good hashing algorithm must have following characteristics:

• The hashing algorithm must be quick enough to hash any sort of data.
• The algorithm should avoid the regeneration of message from its generated hash value (one way).
• The two messages must not have the same hash at any point of time. In other words, the hashing algorithm must avoid the collision.
• The hash value of the message must change when there is even a slight change to the message. It is known as the avalanche effect

## Characteristic of a Hashing Algorithm

A good hashing algorithm must have the following characteristics:

• The hashing algorithm must be quick enough to hash any data.
• The algorithm should avoid the regeneration of a message from its generated hash value (one way).
• The two messages must not have the same hash at any point in time. In other words, the hashing algorithm must avoid the collision.
• The hash value of the message must change when there is even a slight change to the message. It is known as the avalanche effect.

## Types of Hashing Algorithms

Following are some types of hashing algorithms.

1. MD5 Algorithm
2. SHA Algorithms
3. PBKDF2WithHmacSHA1 Algorithm

### MD5 Algorithm

The Message-Digest Algorithm (MD5) is a widely used cryptographic hash function, which generates a 16-byte (128-bit) hash value. It is very simple and easy to understand. The main concept is to do the mapping the data sets of variable length to data sets of a constant length.

To achieve the same, the input message is broken into a group of 512-bit blocks. Padding is joined to the end such that its size can be divided by 512. Now, the blocks are processed using the MD5 algorithm that operates in the 128-bit state, and the result is a 128-bit hash value. After applying the MD5 algorithm, the generated hash is usually a 32-digit hexadecimal number. Here, the string that has to be encoded is usually called the "message" and the hash value that is generated after the hashing is known as the "digest" or "message digest".

FileName: MD5.java

Output:

```The HashCode Generated for 'JavaTpoint' is: 9e0a53565bd3ad4997fe16d35e085ffc
```

### Anomalies of MD5 Algorithm

The MD5 algorithm is a popular algorithm for doing hashing. It is because it is quick to generate a hash and is easy to implement. However, the algorithm has security issues. The hashes that are being generated from this algorithm are fairly week. Also, this algorithm is prone to collision. Therefore, there are fair chances that two different passwords may generate the same hashes.

To make the MD5 algorithm more secure, it is recommended to use salt.

### MD5 Algorithm with Salt

Salt makes the MD5 algorithm much stronger and secure. A salt is nothing but some randomly generated text that is added to the string before it gets processed through the MD5 algorithm. Note that salt is not specific to the MD5 algorithm. It can be applied to the other hashing algorithms too. The following example shows the usage of salt with the MD5 algorithm.

FileName: MD5SaltedExample.java

Output:

```The HashCode Generated for JavaTpoint is: 3bd3b836b49f0b0b0b3bfd137768b6de
The HashCode Generated for JavaTpoint is: 3bd3b836b49f0b0b0b3bfd137768b6de
```

### SHA Algorithm

The Secure Hash Algorithm (SHA) is the member of cryptographic hash functions. The algorithm is similar to the MD5 algorithm. However, unlike the MD5 algorithm, the SHA algorithm produces much stronger hashes. Sometimes, the hashes generated from the SHA algorithm are not always unique, which means there are chances of collision. However, collision in SHA is much lesser than MD5. In Java, there are four implementations of SHA algorithm.

1. SHA-1 - It is the simplest SHA. It produces a 20 bytes or 160 bits.
2. SHA-256 - It is more secure than SHA-1. It produces a hash comprising of 256 bit.
3. SHA-384 - SHA- 384 is one level higher than SHA-256 and generates a hash of 384 bit.
4. SHA-512 - It is the strongest among all the mentioned SHA. It generates a hash of 512 bit.

Thus, we see that larger the size of hash, stronger the hash becomes. It means it is not easy to break the hash of larger size.

The following example shows the implementation of SHA - 256.

FileName: SHAExample.java

Output:

```The HashCode produced by SHA-256 algorithm for strings:

JavaTpoint : f9142e5ca706378c1c7f9daf6782dcff8197ef1ecfd4075b63dae2f40186afa6

India is a great country. : e28335e1124e47ebf4c0b6cb87a34737b70d539241641c60c1969602a7b46cf9
```

#### Note: To generate hash using other implementations of SHA, one can do minor changes in the above code.

```MessageDigest msgDgst = MessageDigest.getInstance("SHA-1"); // for SHA -1

MessageDigest msgDgst = MessageDigest.getInstance("SHA-384"); // for SHA -384

MessageDigest msgDgst = MessageDigest.getInstance("SHA-512"); // for SHA -512
```

### PBKDF2WithHmacSHA1 Algorithm

We have learned about generating secure hashes and even making them more secure with the help of salt. Nowadays, hardware is so much quicker than any sort of password that can be cracked within a short span of time, even with a brute force attack.

In order to fix this problem, a common concept is to ensure that the brute force attack is slower as it will minimize the damage. The PBKDF2WithHmacSHA1 algorithm works on the same concept. The aim is to create the hash method slow enough to delay the attacks. However, at the same time, it has to be fast enough so that it does not make any significant delay in generating hash to the user. The algorithm has a security factor (also known as work factor) or iteration count as its argument. The work count value determines the slowness of the hash function. The more efficient hardware is higher should be the value of this iteration count or work factor.

FileName: SHAExample.java

Output:

```500: d38932f8037b1a2740aad31de8765b25:c6c9c5005e0e421f47cd862bc612e9061fe7a393923af3a9984e8f5d8d4141113f8b8969f17f81bb1f47c81dbb100a071ce3218e509dc2b9371148fee0da7765
```

## Application of Hashing

• Finding Duplicates: Simple rule of hashing is that the same input generates the same hash. Thus, if two hashes are same, then it means the inputs are also the same (assuming hashing methods are collision resistant).
• Message Digest: Message digests are created to secure the data. For example, if we store our files on the cloud, then we have to also ensure that the stored files are not tempered by someone else. To do so, find the hash, using a hashing algorithm, of the files that are being stored on the cloud and save it. When the file is downloaded again from the cloud, compute its hash again. If the hash generated matches with the hash that is saved earlier, then the files are not tempered.
• Compiler Operation: The keywords (while, for, int, if, else, switch, etc.) are processed differently than other identifiers. Other identifiers and keywords are distinguished by the compiler by storing these keywords in a set. The set is being implemented with the help of a hash table.
• Data Structures: Hash tables are extensively used in data structures. Almost all data structures that support key-value pairs use hash tables. For example, HashMap and HashSet in Java, map, and unordered_map in C++ use hash tables.