K Most Frequent Elements in Java

An integer array is given to us. Also, a number K is given to us. Our task is to find the k most frequent elements in the given integer array.

Example: 1

Input:

Int arr[] = {5, 5, 3, 7, 9, 7, 0, 1, 2, 7}, int k = 2

Output: {5, 7}

Explanation: The first two elements that occur the most number of times are 7 (3 times) and 5 (2 times). The same is shown in the output.

Example: 2

Input:

Int arr[] = {9, 2, 0, 1, 4, 8, 6, 3, 0, 1, 5, 4, 4, 1, 7}, int k = 3

Output: {0, 1, 4}

Explanation: The first three elements that occur the most number of times are 0 (2 times), 1 (3 times), and 4 (3 times). The same is shown in the output.

Approach: Using Partial Sort

In this approach, we will split the problem into smaller problems. It is obvious that kth top frequent element is (n - k)th less frequent. Hence, we do a partial sort from the less frequent element to the most frequent one, till the (n - k)th less frequent element takes the (n - k) position in the sorted array. We achieve it using quick select. Observe the following steps.

Step 1: Create a hash map, where key is the element and value is the frequency of occurrence of that element in the input array.

Step 2: Using a loop, iterate over the elements and increase its value by 1 in the hash map created in the previous step.

Step 3: Set the len as the 'MAP.SIZE'.

Step 4: Create an array temp that will contain integers and insert all of the keys of the hash map in it.

Step 5: Invoke the method quickSel(0, 'len' - 1, len - 'K').

Step 6: Return the elements of the array temp from the index (len - K) to len.

In the method quickSel(lft, rght, kSml'), do the following

  • If the "lft" is the same as "rght", exit from the method.
  • Pick a pivot that lies between "lft" and "rght".
  • Set the pivot as the partition(lft, rght, pvt).
  • If 'kSml' is the same as the 'pvt, then exit from the method.
  • Otherwise, if 'kSml' is lesser than the 'pvt, invoke the method quickselect() on the left partition. Else, invoke the method quickSel() on the right partition.

Observe the following implementation based on the above steps.

FileName: KMostFreq.java

Output:

For the input array: 
5 5 3 7 9 7 0 1 2 7 
The first 2 frequent elements are:
5 7 

For the input array: 
9 2 0 1 4 8 6 3 0 1 5 4 4 1 7 
The first 3 frequent elements are:
0 1 4

Complexity Analysis: In the worst-case scenario, the pivot will not divide the problem in half. Thus, leading to the time complexity of O(n2). However, in most cases, the possibility of such worst cases is small. Thus, making the average time complexity of the program O(n). Because of the hash hap, the space complexity of the program is O(n), where n is the total number of elements present in the input array.

Approach: Using Heap

We will sort the array according to the number of times an element occurs in the array. We will be using a hash map where the key is the element itself, and the value is the number of times the element occurs in the input array. We will then use a heap for sorting the elements of the input array in descending order on the basis of the number of times the element occurs. The steps involved are mentioned below.

Step 1: If the value of K is the same as the size of the input array, then return the input array.

Step 2: Create a hash map, where the key is the element, and the value is the frequency of occurrence of that element in the input array.

Step 3: Using a loop, iterate over the elements and increase its value by 1 in the hash map created in the previous step.

Step 4: Create a priority queue pq in order to put the elements that will be sorted in descending order as per the frequency of the element.

Step 5: Adding all of the keys to the map in a heap.

Step 6: Create an array temp[] for storing the answer

Step 7: Add the first k elements of the heap into the array temp, and return the array temp.

The following implementation uses the above-mentioned steps.

FileName: KMostFreq1.java

Output:

For the input array: 
5 5 3 7 9 7 0 1 2 7 
The first 2 frequent elements are:
7 5 

For the input array: 
9 2 0 1 4 8 6 3 0 1 5 4 4 1 7 
The first 3 frequent elements are:
1 4 0

Complexity Analysis: Creating the hash map consumes O(N) time and, in the worst case, building the heap takes O(n x log(n)) times since adding an element to the heap consumes log(n) time. Therefore, the total time complexity of the program is O(n x log(n)), where n is the total number of elements present in the input array. The space complexity of the program is the same as the previous program.

Let's do the optimization further in order to reduce the time complexity.

Approach: Using Bucket Sort

It is obvious that an element can occur at most n time and a minimum 1 time in the input array. Therefore, we can make n buckets and put elements in the bucket as per their frequency of occurrences. For example, if a number is occurring t number of times, then it will go in the bucket bucketArr[t]. After putting all the elements in the bucket, the k elements starting from the rightmost bucket is our solution.

Algorithm

Step 1: Create a hash map, where the key is the element, and the value is the frequency of occurrence of that element in the input array.

Step 2: Using a loop, iterate over the elements and increase its value by 1 in the hash map created in the previous step.

Step 3: Create an array called bucketArr[].

Step 4: Add all of the keys of the hash map in the bucketArr[] as per their frequency of occurrences.

Step 5: Create a temp[] array for storing the answer.

Step 6: Add 'K' elements to temp[] array beginning from the rightmost bucket.

Step 7: Return the array temp.

FileName: KMostFreq2.java

Output:

For the input array: 
5 5 3 7 9 7 0 1 2 7 
The first 2 frequent elements are:
7 5 

For the input array: 
9 2 0 1 4 8 6 3 0 1 5 4 4 1 7 
The first 3 frequent elements are:
1 4 0

Complexity Analysis: The program is traversing the input array element only for a specific period of time. Thus, the time complexity of the program is O(n), where n is the total number of elements present in the array. The space complexity of the program is the same as the previous program.