This article contains information regarding various sorting techniques that may prove useful in various situations without regard to programming language.
Description Of Sorting?
Sorting is a term to describe the ordering of data upon some value. For instance, let’s say that we have a list of people’s names:
- Bob Smith
- Janet Q. Evans
- Mike A. Morris
- Jill Morris
- Zed Z. Zulu
This list of names contains a person’s first name, middle initial, and last name. However, if we wanted to list these names alphabetically for some application (like a phone book directory), then we would need to run the list of last names, then first names, then middle initials through one of several sorting algorithms in the hopes that our end result would look something like this:
- Janet Q. Evans
- Jill Morris
- Mike A. Morris
- Bob Smith
- Zed Z. Zulu
Consider Speed
There are several algorithms that shall be discussed. But you may be asking yourself why it is important to be familiar with various algorithms when all accomplish the same task. The reason is simply the “efficiency” of each. In some textbooks, this term may be described as “cost” or “speed”. Here, we will use all three terms interchangeably.
In short, some algorithms work well for sets of data that are already sorted in the majority, others work well on small datasets, and others work well for datasets of unknown size. In the code samples that follow, functions that appear in ALL CAPS are actually macros that are assumed to be written already. For instance, swaping the contents of two values or inserting into an array or a list.
Various Algorithms
A large part of the following descriptions have come from various sources including, but not limited to, wikipedia.com. The animation for the Quick Sort algorithm was written by Woi Ang. Costs are noted in “Big-Oh Notation” next to each sorting title.
Bubble Sort
Best Case: O(n2)
Worst Case: O(n2)
Description:
Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing two items at a time, swapping these two items if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which means the list is sorted. The algorithm gets its name from the way smaller elements “bubble” to the top of the list via the swaps.
Algorithm:
- Compare adjacent elements. If the first is greater than the second, swap them.
- Do this for each pair of adjacent elements, starting with the first two and ending with the last two. At this point the last element should be the greatest.
- Repeat the steps for all elements except the last one.
- Keep repeating for one fewer element each time, until you have no more pairs to compare.
Code:
[sourcecode language=”cpp”]
void BubbleSort( int aList[], int iNumElements )
{
int iNumSwaps;
int iNumElementsToCompare;
for ( i = 0; i < ( iNumElements – 1 ); i++ )
{
iNumSwaps = 0;
iNumElementsToCompare = iNumElements – ( i + 1 );
for ( j = 0; j < iNumElementsToCompare; j++ )
{
if ( aList[j] > aList[j + 1] )
SWAP( aList[j], aList[j + 1] );
}
// no swaps, so the list must be sorted now
if ( !iNumSwaps )
break;
}
}
[/sourcecode]
Insertion Sort
Best Case: O(n2)
Worst Case: O(n2)
Description:
Insertion sort is a simple sort algorithm in which the sorted array (or list) is built one entry at a time. It is much less efficient than the more advanced algorithms such as quicksort or merge sort, but its advantages are:
- Simple to implement
- Efficient on (quite) small data sets
- Efficient on data sets which are already substantially sorted
- Stable (i.e. does not change the order of already ordered elements)
- Does not suffer from poor “worst case input” performance
- Minimal memory requirements
Algorithm:
- Start with the result being the first element of the input.
- Loop over the input array until it is empty, “removing” the first remaining (leftmost) element.
- Compare the removed element against the current result, starting from the highest (rightmost) element, and working left towards the lowest element.
- If the removed input element is lower than the current result element, copy that value into the following element to make room for the new element below, and repeat with the next lowest result element.
- Otherwise, the new element is in the correct location; save it in the cell left by copying the last examined result up, and start again from step #2 with the next input element.
Code:
[sourcecode language=”cpp”]
void InsertionSort( int aUnsortedList[], int aSortedList[], int iNumElements )
{
int iNumInserts = 0;
for ( i = 0; i < iNumElements; i++ )
{
for ( j = 0; j < iNumInserts; j++ )
{
if ( aUnsortedList[i] < aSortedList[j] )
INSERT( aSortedList, aUnsortedList[i], j + 1 );
}
// put the element at the end of the sorted list array
if ( j == iNumInserts )
aSortList[iNumInserts] = aUnsortedList[i];
}
}
[/sourcecode]
Merge Sort
Best Case: O(n log n)
Worst Case: O(n log n)
Description:
Mergesort is a sort algorithm for rearranging lists (or any other data structure that can only be accessed sequentially, i.e. file streams) into a specified order. It is a particularly good example of the divide and conquer algorithmic paradigm.
Algorithm:
- Divide the unsorted list into two sublists of about half the size
- Sort each of the two sublists
- Merge the two sorted sublists back into one sorted list.
Code:
[sourcecode language=”cpp”]
void MergeSort( int aList[], int iBegin, int iEnd )
{
int iMid;
if ( begin == end )
return;
if ( begin == ( end – 1 ) )
return;
mid = ( begin + end ) >> 1;
MergeSort( aList, iBegin, iMid );
MergeSort( aList, iMid, iEnd );
APPEND( aList, iBegin, iMid, iEnd );
}
[/sourcecode]
Quick Sort
Best Case: O(log n)
Worst Case: O(n2)
Description:
Because of its good average performance and simple implementation, Quicksort is one of the most popular sorting algorithms in use. It is an unstable sort in that it does not preserve any ordering that is already between elements of the same value. Quicksort’s worst-case performance is O(n2); much worse than some other sorting algorithms such as merge sort. However, if pivots are chosen randomly, most bad choices of pivots are unlikely; the worst-case has only probability 1/n! of occurring.
The Quicksort algorithm uses a recursive divide and conquer strategy to sort a list.
Algorithm:
- Pick a pivot element from the list.
- Reorder the list so that all elements less than the pivot precede all elements greater than the pivot. This means that the pivot is in its final place; the algorithm puts at least one element in its final place on each pass over the list. This step is commonly referred to as “partitioning”.
- Recursively sort the sub-list of elements less than the pivot and the sub-list of elements greater than the pivot. If one of the sub-lists is empty or contains one element, it can be ignored.
Code:
[sourcecode language=”cpp”]
void QuickSort( int aList[], int iBegin, int iEnd )
{
if ( iEnd > iBegin )
{
int iPivot = aList[iBegin];
int l = iBegin + 1;
int r = iEnd;
while ( l < r )
{
if ( aList[l] <= iPivot )
{
l++;
}
else
{
SWAP( aList[l], aList[r] );
r–;
}
}
l–;
r++;
SWAP( aList[iBegin], aList[l] );
QuickSort( aList, iBegin, l );
QuickSort( aList, r, iEnd );
}
}
[/sourcecode]
Selection Sort
Best Case: O(n2)
Worst Case: O(n2)
Description:
If you had to invent a sort algorithm on your own, you’d probably write an algorithm similar to selection sort because it is probably the most intuitive and immediate to invent. I know that this was the first algorithm that I wrote on many occasions when presented with a need to sort in my code — well, at least early on in my career.
Algorithm:
- Find the minimum value in the list
- Swap it with the value in the first position
- Find the minimum value amongst the remaining values
- Swap it with the value in the second position
- Repeat until the list is sorted
Code:
[sourcecode language=”cpp”]
void SelectionSort( int aList[], int iBegin, int iEnd )
{
int i;
for ( i = 0 ; i < ( n – 1 ) ; i++ )
{
for ( j = i + 1 ; j < n; j++ )
{
if ( x[i] > x[j] )
{
temp = x[j];
x[j] = x[i];
x[i] = temp;
}
}
}
}
[/sourcecode]
Applications
If you are a DBA, the algorithms described today probably appear transparent to you because most databases (i.e. MySQL, MSSQLServer, Oracle, etc) determine the most efficient algorithms and simply present a graphical option, at times, for the user to switch between the various sorting techniques that the DB implements.
On the other hand, if you are a games programmer, there is much reason to familiarize yourself with the various concepts discussed here. For instance, you might apply these algorithms to a high-score list, a list of sprites, or maybe a BSP-tree.
You shall find that the sorting techniques described here will prove invaluable especially for the various searching techniques that have been developed.