Wednesday, April 2, 2014

Sorting and Efficiency


Sorting algorithms are a very useful tool in the programmers tool box. It is often the case that a large amount of data will be entered for the computer to work with. If this data isn't in any particular order, it is far less useful. This is where sorting algorithms come in. A sorting algorithm is essentially a method of manipulating data to put it in a desired order. There are a couple of factors that indicate how well a sorting algorithm will perform or, what it's efficiency will be.

When looking at an algorithms efficiency, we are generally only concerned with the 'Big – O' case. That is, the scenario that will take the greatest number of steps to complete. The function that models this behavior can be deduced by determining which operations are necessary to perform for a given algorithm. The most common operations to be looked at are the number of comparisons to be performed and the number of exchanges, or shifts of the data to be performed. Different sorting methods will produce different numbers for each so it is important when choosing a sorting method to consider what type of data and how much of it you're working with.

An algorithm like quick sort would easily outperform bubble sort in almost all cases, but what if the data in question was already very near to being sorted? In that case a bubble sort programmed to stop when it noticed that no exchanges were made during a given pass would be probably be more effective.

The necessity for a deeply fundamental understanding of how different sorting strategies operate is crucial, although easily overlooked. If you know what the algorithm does, why do you need to know how it does it? It will do what you want it to so why bother to mess with what works? You might not care one bit. Indeed, a short list of items to be sorted won't cause you any trouble. On the other hand, a massive list could take the rest of your life to sort. While I personally have never dealt with an overly large data set, I have programmed (unintentionally of course) inefficient algorithms for calculating things like the prime factors of a number. When no output has appeared even after checking back in half an hour, the importance of efficiency is apparent.

Efficiency is king when dealing with large sets of data. Most times that's why we are using a computer to handle the data in the first place. If you had a small amount of data chances are, you wouldn't even need one.

Image source: http://codingmash.com/tag/sorting/