Introduction
A lot of programmers that make some of the coolest and most useful software today, such as many of the stuff we see on the Internet or use daily, don’t have a theoretical computer science background. They’re still pretty awesome and creative programmers and we thank them for what they build.
However, theoretical computer science has its uses and applications and can turn out to be quite practical. In this article, targeted at programmers who know their art but who don’t have any theoretical computer science background, I will present one of the most pragmatic tools of computer science: Big O notation and algorithm complexity analysis. As someone who has worked both in a computer science academic setting and in building productionlevel software in the industry, this is the tool I have found to be one of the truly useful ones in practice, so I hope after reading this article you can apply it in your own code to make it better. After reading this post, you should be able to understand all the common terms computer scientists use such as “big O”, “asymptotic behavior” and “worstcase analysis”.
This text is also targeted at the junior high school and high school students from Greece or anywhere else internationally competing in the International Olympiad in Informatics, an algorithms competition for students, or other similar competitions. As such, it does not have any mathematical prerequisites and will give you the background you need in order to continue studying algorithms with a firmer understanding of the theory behind them. As someone who used to compete in these student competitions, I highly advise you to read through this whole introductory material and try to fully understand it, because it will be necessary as you study algorithms and learn more advanced techniques.
I believe this text will be helpful for industry programmers who don’t have too much experience with theoretical computer science (it is a fact that some of the most inspiring software engineers never went to college). But because it’s also for students, it may at times sound a little bit like a textbook. In addition, some of the topics in this text may seem too obvious to you; for example, you may have seen them during your high school years. If you feel you understand them, you can skip them. Other sections go into a bit more depth and become slightly theoretical, as the students competing in this competition need to know more about theoretical algorithms than the average practitioner. But these things are still good to know and not tremendously hard to follow, so it’s likely well worth your time. As the original text was targeted at high school students, no mathematical background is required, so anyone with some programming experience (i.e. if you know what recursion is) will be able to follow through without any problem.
Throughout this article, you will find various pointers that link you to interesting material often outside the scope of the topic under discussion. If you’re an industry programmer, it’s likely that you’re familiar with most of these concepts. If you’re a junior student participating in competitions, following those links will give you clues about other areas of computer science or software engineering that you may not have yet explored which you can look at to broaden your interests.
Big O notation and algorithm complexity analysis is something a lot of industry programmers and junior students alike find hard to understand, fear, or avoid altogether as useless. But it’s not as hard or as theoretical as it may seem at first. Algorithm complexity is just a way to formally measure how fast a program or algorithm runs, so it really is quite pragmatic. Let’s start by motivating the topic a little bit.
Motivation
We already know there are tools to measure how fast a program runs. There are programs called profilers which measure running time in milliseconds and can help us optimize our code by spotting bottlenecks. While this is a useful tool, it isn’t really relevant to algorithm complexity. Algorithm complexity is something designed to compare two algorithms at the idea level — ignoring lowlevel details such as the implementation programming language, the hardware the algorithm runs on, or the instruction set of the given CPU. We want to compare algorithms in terms of just what they are: Ideas of how something is computed. Counting milliseconds won’t help us in that. It’s quite possible that a bad algorithm written in a lowlevel programming language such as assembly runs much quicker than a good algorithm written in a highlevel programming language such as Python or Ruby. So it’s time to define what a “better algorithm” really is.
As algorithms are programs that perform just a computation, and not other things computers often do such as networking tasks or user input and output, complexity analysis allows us to measure how fast a program is when it performs computations. Examples of operations that are purely computational include numerical floatingpoint operations such as addition and multiplication; searching within a database that fits in RAM for a given value; determining the path an artificialintelligence character will walk through in a video game so that they only have to walk a short distance within their virtual world; or running a regular expression pattern match on a string. Clearly, computation is ubiquitous in computer programs.
Complexity analysis is also a tool that allows us to explain how an algorithm behaves as the input grows larger. If we feed it a different input, how will the algorithm behave? If our algorithm takes 1 second to run for an input of size 1000, how will it behave if I double the input size? Will it run just as fast, half as fast, or four times slower? In practical programming, this is important as it allows us to predict how our algorithm will behave when the input data becomes larger. For example, if we’ve made an algorithm for a web application that works well with 1000 users and measure its running time, using algorithm complexity analysis we can have a pretty good idea of what will happen once we get 2000 users instead. For algorithmic competitions, complexity analysis gives us insight about how long our code will run for the largest testcases that are used to test our program’s correctness. So if we’ve measured our program’s behavior for a small input, we can get a good idea of how it will behave for larger inputs. Let’s start by a simple example: Finding the maximum element in an array.
Counting instructions
In this article, I’ll use various programming languages for the examples. However, don’t despair if you don’t know a particular programming language. Since you know programming, you should be able to read the examples without any problem even if you aren’t familiar with the programming language of choice, as they will be simple and I won’t use any esoteric language features. If you’re a student competing in algorithms competitions, you most likely work with C++, so you should have no problem following through. In that case I recommend working on the exercises using C++ for practice.
The maximum element in an array can be looked up using a simple piece of code such as this piece of Javascript code. Given an input array A of size n:
1  var M = A[ 0 ]; 
Now, the first thing we’ll do is count how many fundamental instructions this piece of code executes. We will only do this once and it won’t be necessary as we develop our theory, so bear with me for a few moments as we do this. As we analyze this piece of code, we want to break it up into simple instructions; things that can be executed by the CPU directly  or close to that. We’ll assume our processor can execute the following operations as one instruction each:
 Assigning a value to a variable
 Looking up the value of a particular element in an array
 Comparing two values
 Incrementing a value
 Basic arithmetic operations such as addition and multiplication
We’ll assume branching (the choice between if
and else
parts of code after the if
condition has been evaluated) occurs instantly and won’t count these instructions. In the above code, the first line of code is:
1  var M = A[0] 
This requires 2 instructions: One for looking up A[ 0 ] and one for assigning the value to M (we’re assuming that n is always at least 1). These two instructions are always required by the algorithm, regardless of the value of n. The for
loop initialization code also has to always run. This gives us two more instructions; an assignment and a comparison:
1  i = 0; 
These will run before the first for
loop iteration. After each for
loop iteration, we need two more instructions to run, an increment of i and a comparison to check if we’ll stay in the loop:
1  ++i; 
So, if we ignore the loop body, the number of instructions this algorithm needs is 4 + 2n. That is, 4 instructions at the beginning of the for
loop and 2 instructions at the end of each iteration of which we have n. We can now define a mathematical function f( n ) that, given an n, gives us the number of instructions the algorithm needs. For an empty for
body, we have f( n ) = 4 + 2n.
Worstcase analysis
Now, looking at the for
body, we have an array lookup operation and a comparison that happen always:
1  if ( A[ i ] >= M ) { ... 
That’s two instructions right there. But the if
body may run or may not run, depending on what the array values actually are. If it happens to be so that A[ i ] >= M
, then we’ll run these two additional instructions — an array lookup and an assignment:
1  M = A[ i ] 
But now we can’t define an f( n ) as easily, because our number of instructions doesn’t depend solely on n but also on our input. For example, for A = [ 1, 2, 3, 4 ]
the algorithm will need more instructions than for A = [ 4, 3, 2, 1 ]
. When analyzing algorithms, we often consider the worstcase scenario. What’s the worst that can happen for our algorithm? When does our algorithm need the most instructions to complete? In this case, it is when we have an array in increasing order such as A = [ 1, 2, 3, 4 ]
. In that case, M needs to be replaced every single time and so that yields the most instructions. Computer scientists have a fancy name for that and they call it worstcase analysis; that’s nothing more than just considering the case when we’re the most unlucky. So, in the worst case, we have 4 instructions to run within the for
body, so we have f( n ) = 4 + 2n + 4n = 6n + 4. This function f, given a problem size n, gives us the number of instructions that would be needed in the worstcase.
Asymptotic behavior
Given such a function, we have a pretty good idea of how fast an algorithm is. However, as I promised, we won’t be needing to go through the tedious task of counting instructions in our program. Besides, the number of actual CPU instructions needed for each programming language statement depends on the compiler of our programming language and on the available CPU instruction set (i.e. whether it’s an AMD or an Intel Pentium on your PC, or a MIPS processor on your Playstation 2) and we said we’d be ignoring that. We’ll now run our “f” function through a “filter” which will help us get rid of those minor details that computer scientists prefer to ignore.
In our function, 6n + 4, we have two terms: 6n and 4. In complexity analysis we only care about what happens to the instructioncounting function as the program input (n) grows large. This really goes along with the previous ideas of “worstcase scenario” behavior: We’re interested in how our algorithm behaves when treated badly; when it’s challenged to do something hard. Notice that this is really useful when comparing algorithms. If an algorithm beats another algorithm for a large input, it’s most probably true that the faster algorithm remains faster when given an easier, smaller input. From the terms that we are considering, we’ll drop all the terms that grow slowly and only keep the ones that grow fast as n becomes larger. Clearly 4 remains a 4 as n grows larger, but 6n grows larger and larger, so it tends to matter more and more for larger problems. Therefore, the first thing we will do is drop the 4 and keep the function as f( n ) = 6n.
This makes sense if you think about it, as the 4 is simply an “initialization constant”. Different programming languages may require a different time to set up. For example, Java needs some time to initialize its virtual machine. Since we’re ignoring programming language differences, it only makes sense to ignore this value.
The second thing we’ll ignore is the constant multiplier in front of n, and so our function will become f( n ) = n. As you can see this simplifies things quite a lot. Again, it makes some sense to drop this multiplicative constant if we think about how different programming languages compile. The “array lookup” statement in one language may compile to different instructions in different programming languages. For example, in C, doing A[ i ]
does not include a check that i is within the declared array size, while in Pascal it does. So, the following Pascal code:
1  M := A[ i ] 
Is the equivalent of the following in C:
1  if (i >= 0 && i < n) { 
So it’s reasonable to expect that different programming languages will yield different factors when we count their instructions. In our example in which we are using a dumb compiler for Pascal that is oblivious of possible optimizations, Pascal requires 3 instructions for each array access instead of the 1 instruction C requires. Dropping this factor goes along the lines of ignoring the differences between particular programming languages and compilers and only analyzing the idea of the algorithm itself.
This filter of “dropping all factors” and of “keeping the largest growing term” as described above is what we call asymptotic behavior. So the asymptotic behavior of f( n ) = 2n + 8 is described by the function f( n ) = n. Mathematically speaking, what we’re saying here is that we’re interested in the limit of function f as n tends to infinity; but if you don’t understand what that phrase formally means, don’t worry, because this is all you need to know. (On a side note, in a strict mathematical setting, we would not be able to drop the constants in the limit; but for computer science purposes, we want to do that for the reasons described above.) Let’s work a couple of examples to familiarize ourselves with the concept.
Let us find the asymptotic behavior of the following example functions by dropping the constant factors and by keeping the terms that grow the fastest.
f( n ) = 5n + 12 gives f( n ) = n.
By using the exact same reasoning as above.
f( n ) = 109 gives f( n ) = 1.
We’re dropping the multiplier 109 * 1, but we still have to put a 1 here to indicate that this function has a nonzero value.
f( n ) = n2 + 3n + 112 gives f( n ) = n2
Here, n2 grows larger than 3n for sufficiently large n, so we’re keeping that.
f( n ) = n3 + 1999n + 1337 gives f( n ) = n3
Even though the factor in front of n is quite large, we can still find a large enough n so that n3 is bigger than 1999n. As we’re interested in the behavior for very large values of n, we only keep n3 (See Figure 2).
f( n ) = n + (n)^(1/2) gives f( n ) = n
This is so because n grows faster than (n)^(1/2)as we increase n.
Complexity
So what this is telling us is that since we can drop all these decorative constants, it’s pretty easy to tell the asymptotic behavior of the instructioncounting function of a program. In fact, any program that doesn’t have any loops will have f( n ) = 1, since the number of instructions it needs is just a constant (unless it uses recursion; see below). Any program with a single loop which goes from 1 to n will have f( n ) = n, since it will do a constant number of instructions before the loop, a constant number of instructions after the loop, and a constant number of instructions within the loop which all run n times.
This should now be much easier and less tedious than counting individual instructions, so let’s take a look at a couple of examples to get familiar with this. The following PHP program checks to see if a particular value exists within an array A of size n:
1 

This method of searching for a value within an array is called linear search. This is a reasonable name, as this program has f( n ) = n (we’ll define exactly what “linear” means in the next section). You may notice that there’s a “break” statement here that may make the program terminate sooner, even after a single iteration. But recall that we’re interested in the worstcase scenario, which for this program is for the array A to not contain the value. So we still have f( n ) = n.
Rule of thumb
 Simple programs can be analyzed by counting the nested loops of the program. A single loop over n items yields f( n ) = n. A loop within a loop yields f( n ) = n^2. A loop within a loop within a loop yields f( n ) = n^3.
 Given a series of for loops that are sequential, the slowest of them determines the asymptotic behavior of the program. Two nested loops followed by a single loop is asymptotically the same as the nested loops alone, because the nested loops dominate the simple loop.
 Programs with a bigger Θ run slower than programs with a smaller Θ.
 It’s easier to figure out the Ocomplexity of an algorithm than its Θcomplexity.
 While all the symbols O, o, Ω, ω and Θ are useful at times, O is the one used more commonly, as it’s easier to determine than Θ and more practically useful than Ω.
 For competition algorithms implemented in C++, once you’ve analyzed your complexity, you can get a rough estimate of how fast your program will run by expecting it to perform about 1,000,000 operations per second, where the operations you count are given by the asymptotic behavior function describing your algorithm. For example, a Θ( n ) algorithm takes about a second to process the input for n = 1,000,000.
 Improving the asymptotic running time of a program often tremendously increases its performance, much more than any smaller “technical” optimizations such as using a faster programming language.