Find k numbers with most occurrences in the given array

Introduction

This article will examine an effective Python algorithm to identify the "k" numbers that appear the most frequently in an array. Finding the elements with the highest frequency is a common data analysis challenge with a variety of uses, including identifying popular e-commerce items, studying user behaviour, and processing large datasets.

Recognising the Issue

Let's define the issue clearly before getting into the technical details. Our goal is to identify the "k" numbers that appear in the array the most frequently given an array of integers. No matter what order the elements appear in the array, we must find the elements that occur most frequently.

Consider the following input array, for instance:

[1, 3, 4, 3, 2, 4, 4, 2, 5]

And let's assume k = 2, then the output should be:

[4, 3]

The highest frequency of 3 in the array is shared by both 4 and 3.

Hash Table

The hash map method involves iterating through the array and keeping track of a dictionary to determine how many times each element appears. After counting, we sort the dictionary in descending order based on frequency before extracting the top "k" elements. This method's simplicity stems from the fact that Python offers built-in data structures like dictionaries to handle this task successfully.

Heap

The "k" elements with the highest frequency are tracked using a min-heap in the heap approach. Every time we reach a new element in the array as we traverse it, the heap is updated appropriately. If the heap's size is greater than "k," the element with the lowest frequency is eliminated. When working with large datasets or when "k" is significantly less than the array size, the heap approach is particularly helpful because it is memory-efficient.

Application of the Algorithm

We will outline the Python code in detail for both the hash map and heap approaches.

Using Hash Map

Frequency Map Construction

First, we develop a Python function that uses the hash map strategy to identify the "k" numbers with the greatest frequency:

Python code for finding k numbers with most occurrences using hash map:

Output:

```[(1, 3), (2, 2)]
```

This code snippet creates the frequency_map dictionary from scratch. The next step is to iterate through the input array arr and retrieve the current frequency count for each element using the get() method. The get() method returns 0 when the element is encountered for the first time, and we add 1 to that value. This procedure is repeated for each element in the array, resulting in a dictionary with the elements' frequencies as keys and their values as values.

*The Frequency Map is sorted.

We can now extract the top "k" elements by sorting the frequency map in descending order based on frequency. We can add the following step to the find_k_most_frequent() function:

Python code for finding k numbers with most occurrences using hash map and sorting:

Output:

```[1, 2]
```

The dictionary's key-value pairs are returned as a list of tuples by the items() method, and we order this list according to the frequency of each tuple's second element. The result is then returned after we remove the top "k" elements from the sorted list.

Building the Min-Heap Using the Heap

To find the "k" numbers with the most occurrences using the heap approach, we must first create a Python function that is similar to the hash map approach:

Python code for finding k numbers with most occurrences using heap:

Output:

```[1, 2]
```

The heapq module, which offers functions to implement heaps in Python, is used in this snippet of code. To serve as the min-heap, we create an empty list called min_heap. Similar to the hash map method, we update the frequency_map as we iterate through the input array arr.

Dealing with New Elements

• We use the heappush() function to push a tuple of (frequency, element) into the min-heap for each element in the frequency map. The heap will be sorted according to the frequency of each tuple's first element thanks to this tuple structure.
• The heappop() function is used to remove the smallest element from the heap if the heap size is greater than "k". The "k" elements in the min-heap, which have the highest frequency, are effectively maintained by this process.
• Finally, we take the components out of the min-heap and return them. We reverse the list using slicing ([::-1]) to get the elements with the highest frequency first because the list of elements will be in ascending order based on frequency.

Analysis of Time and Space Complexity

Hash Map Method

The sorting step, which requires O(n log n) time, where "n" is the array size, dominates the time complexity of the hash map approach.

Since we must store the frequency map, which can contain every unique element from the array, the space complexity is O(n).

Massive Approach

The insertion and deletion operations in the heap take O(log k) time, and we perform them for "n" elements, so the heap approach's time complexity is O(n log k).

Since we only need to store "k" elements in the min-heap, the space complexity is O(k).

Comparing the Methodologies

After exploring the hash map and heap approaches, let's evaluate how well they perform in various scenarios.

Small "k" Value

The heap approach can outperform the hash map approach when the value of "k" is relatively small in comparison to the size of the array. This is because the heap approach uses less memory and could possibly result in faster sorting because it only maintains "k" elements in the min-heap.

Large "k" Value

On the other hand, the hash map approach might work better when the value of "k" is close to the size of the array or "k" is significantly larger than the number of distinct elements in the array. It may be more effective to sort a small frequency map rather than keep a big min-heap.

Array Size versus "k"

In general, the heap approach can be more memory-efficient and, therefore, better suited to handling big data scenarios if the array size "n" is significantly larger than "k."

Applications in the Real World

Real-world applications of the element frequency problem can be found in a number of fields, including:

Internet-based Platforms

Analysing customer purchase history can help e-commerce platforms identify popular products and enhance product recommendation systems. Businesses can concentrate on marketing and expanding the availability of the products with the highest purchase frequency by identifying those products.

Analysis of social media

Analysing trending topics and hashtags in social media analysis can reveal information about user interests and assist in the creation of interesting content. Social media platforms can increase user engagement and content discovery by identifying the topics that are discussed the most frequently.

Analyses of Customer Behaviour

Understanding customer behaviour and preferences is essential for businesses to make wise decisions. Frequency analysis can be used to help businesses identify the most popular features or services so they can better tailor their offerings to meet customer needs.