Index O'Stuff
Home Compression Arithmetic CodingBurrows-Wheeler Transform Huffman Coding LZSS Coding LZW Coding Run Length Encoding Misc. Programming School ProjectsThesis Project crypt(3) Source Hamming Codes Bit Manipulation Libraries Square Root Approximation Library Sort Library Trailing Space Trimmer and Tab Remover Command Line Option Parser Humor Dictionary O'Modern TermsThe Ten Commandments of C Style Other Stuff TOPS Success StoryFree Win32 Software External Links SPAN (Spay/Neuter Animal Network)Los Angeles Pet Memorial Park Mirrors
Pages at geocitiesPages at dipperstein.com Obligatory Links
Visits since I started counting: |
A Discussion of Sorting Methods and Implementationsby Michael DippersteinThe more basic stuff I publish source for, the more requests for source I seem to get. This time my Huffman code page landed me a request for C++ code which implements Huffman coding and merge sort. The request obviously came from a student trying to get me to do his/her homework. I didn't do the homework assignment, but it started me thinking about sorting algorithms, and off I went on a trip down sorting memory lane. I thought that I could put everything to rest after reading about some of the algorithms I remembered the names of. After all, there are people that have made a career out of studying sorting algorithms, what would I want to do that hadn't already been done? I thought wrong, there's a lot of discussion about the algorithms, and even some code. Most of the code I saw had errors. Other code snippets were just obfuscated by object oriented programming techniques. So I decided to write my own library of sorting algorithms. As time goes on, I continue to discover new algorithms and sometimes even add them to my library. I recently discovered the Radix Sort while reading about optimizations to the Burrows-Wheeler Transformation. I'm sure I'll find another sort to add in the future. Click here for instructions on downloading my sorting library. The rest of this page discusses some of the issues concerning the sorting algorithms that I chose to implement. Sorting AlgorithmsThere are several popular and well researched sorting algorithms. My sort library implements the traditional version of the following algorithms: Each of the algorithms has the following order of operations:
Insertion SortInsertion sort is probably the first sorting algorithm that is taught in programming classes. It is far from efficient in terms of the number of comparisons, but is very compact in terms of code space required. As such, the insertion sort algorithm may be useful for sorting small lists. Insertion sort is an iterative algorithm that makes n - 1 passes on a list of n items. On the first pass the second item in the unsorted list is inserted in the proper position for a sorted list containing the first two items. The next pass, the third item is inserted in a the proper position for a list sorted list containing the first three items. After the kth iteration the first k + 1 items in the list are in sorted order. The algorithm continues until the nth item is inserted into its position. The insertion sort algorithm is implemented by my sort library function InsertionSort. Bubble SortWhen I first learned the bubble sort algorithm, it was described as "cute". I believe that cuteness is the only thing this algorithm has going for it. It typically requires the same order of operations as insertion sort, and is not any more code efficient. Bubble sort is an iterative algorithm that makes n - 1 passes on a list of n items. On the first pass, bubble sort starts at the head of the list and compares the first two items, if the items are out of order, they are swapped, then the second and third items are compared and swapped if necessary. The comparison and swapping continue to the end of the list. After the first iteration, the largest item is in the nth position. The algorithm repeats itself, stopping one item earlier each pass. The algorithm completes when only the first two items are compared. This sort library includes an additional optimization which will halt the algorithm on any pass which finds the list to be already sorted (no swaps are performed). The bubble sort algorithm is implemented by my sort library function BubbleSort. Shell SortShell sort is an optimization of the insertion sort proposed by D. L. Shell. Shell observed that performing an insertion sort on sublist of items that are separated by some distance (an increment) may allow items to be moved further across the list. For a list of n items [x1 ... xn], select an increment,
i, and start by sorting the resulting n/i sub lists: Next decrease i and repeat the process until i equals 1. At this time, no optimal increment value is known. However, D.E. Knuth has demonstrated that using increments of the form (3k - 1) / 2 results in an O(n3/2) Shell sort. The Shell sort algorithm is implemented by my sort library function ShellSort. The Shell sort function provided in this library uses increments of the form (3k - 1) / 2. Quick SortANSI C provides a quick sort function, qsort in stdlib.h, yet I felt compelled to include a quick sort function in my library. Quick sort is a divide and conquer algorithm with two phases, a partition phase and a sort phase. In the partition phase all list items less than or equal to a pivot item are placed in one partition, while all items greater than a pivot are placed in another partition. For my implementation, I always designate the first item in the list as the pivot. To partition, I maintain a left and a right search pointer. The left pointer starts at the second item in the list and the right pointer starts at the last item in the list. I then move the left pointer to the right until it finds an item greater than the pivot or points to the same item as the right pointer. If the left pointer crosses the right pointer, the list is partitioned at the crossing point. Otherwise, the right pointer is moved left until it crosses the left pointer or finds an item less than or equal to the pivot. If the right pointer crosses the left pointer, the list is partitioned at the crossing point. Otherwise the items pointed to by the two pointers are swapped, and I begin the process of moving the left and right pointers again. The sort phase is really simple. Just call the quick sort function twice, once passing the left partition as the list to be sorted, then passing the right partition as the list to be sorted. Quick sort has an average order of operation of O(n × log(n)), but when the partitions are not anywhere near even, the order of operation approaches O(n2). For sorted lists and reverse sorted lists quick sort is an O(n2) algorithm. My sort library function QuickSort implements the quick sort algorithm. It is a recursive implementation of quick sort which makes it pretty inefficient with stack space. The stack space consumed may be equal to the size of the list. A web search should reveal several memory optimizations for quick sort. Merge SortLike quick sort, merge sort is a divide and conquer algorithm with two phases. In the case of merge sort, the phases are a partition phase, and a merge phase. In the partition phase, if the list of items to be sorted contains more than one item, the list is split in half and the merge sort algorithm is applied on each half of the list. At the end of the partition phase, the algorithm has two sorted list halves. The merge phase merges the two sorted halves into one list. For my implementation of this algorithm, I perform the merge by allocating a block of memory equal to the size of the combined lists, and merge the lists into the new memory block. Allocation of a new memory block means that this implementation requires additional storage space equal to the size of the original list. To merge two lists, compare the lowest value in one list with the lowest value in the other list. Remove the lowest of the two values from it's list and place it at the end of the merged list. Repeat this process until there are no more items in either list. Unlike quick sort, merge sort always partitions the list in half, so it is always an order O(n × log(n)) algorithm. The merge sort algorithm is implemented by my sort library function MergeSort. Heap SortHeap sort a popular in place O(n × log(n)) sort algorithm. Unlike quick sort and merge sort, heap sort does not require additional memory allocations to store pieces of the list being sorted. Heap sort is a two phase algorithm, it has a heap building phase and a promotion phase. A heap is a binary tree with the propriety that every node is greater than it's children. For a list of 1 .. N items this means that item i is greater than it's children, items (i / 2) and ((i / 2) + 1). Once the heap is built, the largest item will be at the root of the heap. To sort, simply swap the first item in the list with Nth item in the list. Next rebuild the heap with items 1 .. (N - 1). Repeat the process of swapping and rebuilding the heap with one less item, until only two items remain. The heap sort algorithm is implemented by my sort library function HeapSort. Radix SortRadix sort is an unusual sort algorithm. All the sort algorithms discussed so far compare the elements being sorted to eachother. The radix sort doesn't. A radix sort uses one or more keys to sort values. The key of a value indicates the sort grouping for a value on a particular pass. Successive passes with different keys may be used to sort a list of elements. On a radix sort pass, a list of items to be sorted is processed from beginning to end. One at a time, the items in the list are placed at the end of a list of items with the same key value. After all items have been processed, the lists of key values are reassembled smallest key to largest key. An somewhat intuitive example of how to use a radix sort might be alphabetizing strings of length N. Each pass of the radix sort should use a character in the string as it's key. The less intuitive part about alphabetizing is that radix sort passes should be from performed from the last character to the first character. The first pass will use the Nth character, the second pass will use the (N - 1)th character, ..., the Nth pass will use the first character. Example: Sorting 3 Characters (start with third character)
Sorting unsigned integers is similar to sorting strings. If we use a radix of 10, we can sort integers one digits at a time. Like strings, integers should be sorted least significant digit to most significant digit. Example: Sorting Integers (start with least significant digit)
Pierre Terdiman's article Radix Sort Revisited discusses radix sort techniques for sorting signed integers and floating point values. The radix sort algorithm is implemented by my sort library function RadixSort. Each time RadixSort is called, it makes two passes on the data being sorted. The first pass counts the number of values with each possible key, the second pass places the data in its sorted order. Counting the number of values for each key allows the data to be sorted into a fixed size array instead of a linked list or similar data structure. Function UsageThe library archive that I have provided includes a program file, sample.c, which demonstrates the usage of each sort function. I have also provided the function VerifySort which may be used to determine if a list is actually sorted. All of my sort functions share the same parameter list as the qsort function provided by stdlib.h:
By utilizing the same format, code which calls qsort may be modified to use use my functions simply by changing the function name. Conversely, code that uses my functions may be made to use qsort simply by changing the function name. It is possible to sacrifice flexibility for speed by rewriting the functions so that a fixed comparison is performed inline instead of calling the comparison function passed as a parameter. Greater speed may also be obtained by using arrays of a known type rather instead of void pointers with an item size determined at runtime. Actual SoftwareI am releasing my implementation of the sorting algorithms under the LGPL. All of the functions have been combined into a single library. The zipped archive sort.zip includes, ANSI C source, a sample program demonstrating the usage of each sort function, and a Makfile for building the library with GNU make and gcc. If you have any further questions or comments, you may contact me by e-mail. My e-mail address is: mdipper@alumni.engr.ucsb.edu |
Home
Last updated on August 30, 2007