External Merge Sort Algorithm

We can efficiently sort massive amounts of data using External Merge Sort Algorithm, when the data being sorted don’t fit into the main memory (which is usually RAM) and resides in the slower external memory (which is usually a hard disk).


External Merge Sort uses a hybrid sort-merge technique. The chunks of data small enough to fit in the RAM are read, sorted, and written out to a temporary file during the sorting phase. In the merge phase, the sorted sub-files are combined into a single larger file.
 

In other words, external External Merge Sort sorts the chunks of data that fits in the main memory, then merges the sorted chunks together. i.e.

  1. We first divide the file into runs such that the size of a run is small enough to fit into main memory.
     
  2. Then sort each run in main memory using standard Merge Sort sorting algorithm.
     
  3. Finally we merge the resulting runs together into successively bigger runs, until the file is sorted.

 
Below is C++ code to demonstrate External Merge Sort Algorithm:

 

 
Please note that this code doesn’t work on online compilers as it requires file creation permissions. When run locally, it will produce sample input file “input.txt” with 10000 random numbers. It sorts the numbers and puts the sorted numbers in a file “output.txt”. It also generates files with names 1, 2, .. to store sorted runs.

 
Author: Aditya Goel

 
References: https://en.wikipedia.org/wiki/External_sorting

 
Thanks for reading.




Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂
 





Leave a Reply

Notify of
avatar
Sort by:   newest | oldest | most voted
JD
Guest

Awesome guys

wpDiscuz