External merge sort

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive).

External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub-files are combined into a single larger file.

The idea behind external sorting is to sort the chunks of data that fits in RAM, then merges the sorted chunks together. i.e.

1. We first divide the file into runs such that the size of a run is small enough to fit into main memory.

2. Then sort each run in main memory using standard sorting algorithm.

3. Finally we merge the resulting runs together into successively bigger runs, until the file is sorted.

C++ implementation –

This code won’t work on online compiler as it requires file creation permissions. When run local machine, it produces sample input file “input.txt” with 10000 random numbers. It sorts the numbers and puts the sorted numbers in a file “output.txt”. It also generates files with names 1, 2, .. to store sorted runs.

Author: Aditya Goel

Thanks for reading.

Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂