Efficiently sort an array with many duplicated values

Google Translate Icon

Given an integer array with many duplicated elements, write an algorithm to efficiently sort it in linear time, where the order of equal elements doesn’t matter.

For example,

Input:  { 4, 2, 40, 10, 10, 1, 4, 2, 1, 10, 40 }
 
Output: { 1, 1, 2, 2, 4, 4, 10, 10, 10, 40, 40 }

Practice this problem

A simple solution would be to use efficient sorting algorithms like Merge Sort, Quicksort, Heapsort, etc., that can solve this problem in O(n.log(n)) time, but those will not take advantage of the fact that there are many duplicated values in the array.

 
A better approach is to use a counting sort. This will bring down the time complexity to O(n + k), where n is the size of the input and k is the input range. Following is the C, Java, and Python program that demonstrates it:

C


Download  Run Code

Output:

1 1 2 2 4 4 10 10 10 40 40

Java


Download  Run Code

Output:

1 1 2 2 4 4 10 10 10 40 40

Python


Download  Run Code

Output:

1 1 2 2 4 4 10 10 10 40 40

The problem with the counting sort is its high space complexity O(k) when k >> n, i.e., variation in keys is significantly greater than the total number of items. Also, the program will only work for positive integers. Here is another implementation that calculates the exact range of array elements instead of making any assumption by calculating the difference between the maximum and minimum key values.

 
We can also use a hash table to solve this problem effectively. The idea is to:

  1. Iterate through the array once and store the number of occurrences of individual elements in a hash table.
  2. Sort the unique elements present in the hash table according to the natural ordering.
  3. Overwrite the input array with sorted elements depending on frequencies stored in the hash table.

The time complexity of this solution would be O(n.log(k)), where k is the total number of unique elements present in the input of size n, which can be much less than O(n.log(n)) if the range of input elements is small. The additional space used by the program is O(k). Following is the C++, Java, and Python program that demonstrates it:

C++


Download  Run Code

Output:

1 1 2 2 4 4 10 10 10 40 40

Java


Download  Run Code

Output:

[1, 1, 2, 2, 4, 4, 10, 10, 10, 40, 40]

Python


Download  Run Code

Output:

[1, 1, 2, 2, 4, 4, 10, 10, 10, 40, 40]

Rate this post

Average rating 4.78/5. Vote count: 147

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Tell us how we can improve this post?




Thanks for reading.

Please use our online compiler to post code in comments using C, C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages.

Like us? Refer us to your friends and help us grow. Happy coding :)



Subscribe
Notify of
guest
5 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Do NOT follow this link or you will be banned from the site!