Algorithms · Optimisations

Software Optimisation

Time, and resources, cost money. There’s no way around it, people don’t want to use slow services, nobody wants to wait any more than they feel that they have to. Nobody wants to pay more than they have to.

With that in mind software developers, and architects, are looking to handle as much work as possible, with as little cost in time (and resources) as possible.

Note: I use big O notation to show time complexity of algorithms, if that’s new to you, I strongly recommend brushing up on the topic, perhaps by reading my super, awesome big O notation post.

There’s a maxim on how to optimise algorithms and software services in general:

  1. Do Less
  2. Do it less often
  3. Buy a faster computer

The largest gains come from 1, but we spend all our time on 3. — Michael Fromberger

Sounds easy, doesn’t it :)

Do Less

An example of the first principle is searching data; if the data is stored in an unsorted contiguous array, or linked list, in order to find what is being looked for the program has to iterate from one end, until it finds the value it is looking for (if it exists). This is expressed as a O(n) algorithm (the worst case is that the search will read all of the items in the list, and the longer the list, the longer that the algorithm will take to find the item).

If the data is stored in a sorted list/array, that search becomes O(log2 n), that is, using binary search where the midpoint in the array is checked, and if the value being searched for is smaller checking only the bottom half, or if it’s bigger checking the top half, and repeating that until the value is found or is impossible to find, takes the logarithm of the number of values being held in the list. Log2, because the size of the search halves at each step.

That means that, compared to the first search, there is a lot less work being done. However the data has to be sorted first.

There’s an even faster way to do the search, and that is knowing what slot the value is stored in. And that can be done in a hash map (also known as an associative array, or a dictionary). A lookup in a hash map is O(1), that’s right, no matter how many items have been stored, and no matter if they are sorted, or not, a search on a hash map is going to always take the same amount of time.

There’s a couple of catches, though, the first is that a hashing algorithm needs to calculate where the data should go (and the ‘key’ that will be used to store and retrieve the data must be hashable, that is, it will always return the same hash value for the same input, ruling out things that can change, like other hash maps, or arrays).

The second is that the hashing algorithm, even though it’s consistently run for each action, has a time and memory cost of its own. That cost will depend on implementation decisions, for example, how to handle collisions (where two different values produce the same hash), but the cost is measurable and can be significant (a jump table, created by a switch statement, of 2000 entries, can be faster than a hash by orders of magnitude).

So, hashing is only worthwhile if the number of items being stored exceeds some arbitrary limit, but, when that threshold has been crossed, the hashmap becomes the tool that does the least work per search.

Do it less often.

A SQL Database is usually implemented as a Binary Search Tree. These give CRUD operations a O(log2 n) time complexity. A search for a row with a given key is at most O(log2 n) search operations to find.

There’s something important that should be known though. Each operation, the search for the next key, requires a disk operation (the CPU will ask the disk to find the sector that contains the next key).

There’s a lot of waste in that, a key will only use a tiny amount of on disk memory (a few bytes), but a disk retrieval operation might allow for a complete sector to be retrieved each time (maybe 4Kbytes).

So most databases take advantage of this to reduce the number of disk IOPs that are required to be performed. They store 50 or 100 keys in each node, instead of 1 for the BST). Therefore instead of retrieving one key each time, they might retrieve 50, or 100 keys.

This changes the number of times that the disk needs to be searched, now the database is going to be log50n or log100n deep, so there will be a worst case of log50n or log100n disk IOPs, instead of log2n.

To put a slightly more concrete example on the math for this a database with 100 keys:

  • The base 2 logarithm of 100 is 6.6438561897747.
  • The base 50 logarithm of 100 is 1.1771838201356.
  • The base 100 logarithm of 100 is 1.

So, a database, with 100 keys, when using 1 key per node will need to make (at most) 7 disk queries, 50 keys per node will need to make (at most) 2, and 100 keys per node will need to make (at most) 1. There’s no way to do a partial disk IOP, so it’s rounded up to the nearest whole number.

For 1000 keys:

  • The base 2 logarithm of 1000 is 9.9657842846621.
  • The base 50 logarithm of 1000 is 1.7657757302033.
  • The base 100 logarithm of 1000 is 1.5.

So, a database, with 1000 keys, when using 1 key per node will need to make (at most) 10 disk queries, 50 keys per node will need to make (at most) 2, and 100 keys per node will need to make (at most) 2.

And, finally, for 10000 keys:

  • The base 2 logarithm of 10000 is 13.287712379549.
  • The base 50 logarithm of 10000 is 2.3543676402711.
  • The base 100 logarithm of 10000 is 2.

Note: What should also be clear from the examples is that there’s not a massive difference between log50 and log100, which is why, for smaller tables, 50 is fine, and, in fact, 50 allows keys to be bigger (because there is more space available per key, the amount of data per IOP available doesn’t change).

Note 2: Although the number of keys per node changes the number of disk IOPs required, the search is still log2n, that is, the database will pull the node from the disk, then perform a binary search on that node, then, if required, it pulls the next node down, until it has found the key it is looking for, or it has realised that it’s impossible for the key to exist.

Buy a faster computer

The only other option for optimisation left is to increase the amount of physical CPU and memory available to your algorithm. This is a short note on that as I intend to (one day) write a full post or so on the subject.

There are two ways that hardware can be used to improve the speed of an algorithm. Vertically, or Horizontally.

Vertical means to take a single machine and increase the CPU speed or RAM installed. There are physical limits on how much of each can be added at any given point in time (that is, technology at the time will only allow a CPU with a given clock speed, or a certain amount of RAM to be addressed, to be installed on a motherboard).

Horizontal means to add multiple machines and spread the work across them all.

Both have strengths and weaknesses, and both require some knowledge on how to address specific shortcomings.

Published:
comments powered by Disqus