Computer Science · Caches · Multi Process Programming

When should caches be used

Caches are a tool in the Software engineers’ toolbox that often cause confusion, just when should someone reach for the cache as a solution to their problem.

Note: This is about when to use a cache, there are no solutions proposed here to the much older issue, how to solve the cache invalidation problem.

We use caches to allow a producer and consumer to operate at different speeds, the cache decouples them. That is, a producer might provide a given piece of information n times per second, and the consumer might use that piece of information m times per second. Caches decouple the producer and consumer such that the slower of the two does not hinder the faster.

There are some limitations to this, though. If the producer is creating unique information each time a consumer needs it, and the producer is slower than the consumer, then the cache offers no help at all.

Also, if the producer is faster than the consumer, then, once the cache fills, the producer has to slow to the same speed that the consumer reads (and removes) information from the cache (making room for new information).

Caches therefore should be used in the following usecases.

First, and foremost,is there a difference in the speed it takes a producer to create information, and the speed that it takes for the consumer to read it.

If the producer is slower than the consumer, the next question is, is the data able to be cached, that is, is there a way to have the information in the cache before the consumer needs it.

This works in many ways, if information is retrieved by a producer, and there is a high probability that information near it will also be used, then grabbing the nearby data and holding it in a buffer (cache) ready to be used by the consumer makes sense.

This is also a feature of “branch prediction” where a CPU tries to compute more than is being requested, having information ready for the process consuming it before being asked for the computed information.

If the information that the consumer wants is going to be read time and time again, and there will be no changes to the information, then the producer only needs to create that information once, populate the cache, and the consumer(s) can read it many, many times.

When the consumer is slower than the producer a cache may still be helpful, but only if the producer is “bursty”. That is, the producer might create information faster than the consumer can process it, but it only does that for a short period of time, after which it becomes dormant, or otherwise becomes slower than the consumer.

It should be clear now that caches are useful tools, and help with processing, when used correctly. And determining when to use a cache should now be clear.

Published:
comments powered by Disqus