Computer Science · Data Races

Benign Data Races in Go

Every so often, in one forum, or another, the subject of benign data races comes up. People want to know if a data race absolutely must have synchronisation guards applied (mutexes, semaphores, etc).

For those not familiar with the concept, a data race is when two or more goroutines/threads/processes/etc have access to a mutable resource, and at least one of them mutates that resource. A benign data race is when such a data race is claimed to have no ill effect on the application.

Benign Data races have been the subject of discussion in a number of technologies for some time, and are contentious.

Java documentation talks about benign data races not only being possible, but also advantageous, because of improved performance (no need to lock/unlock synchronisation tools). Note, from the documentation

In addition to benign data-races, a large class of applications allow data-races because they rely on lock-free and wait-free algorithms which are difficult to design correctly. 

(My blog post isn’t going to delve into lock free programming, beyond Benign data races)

Hans-J. Boehm wrote a paper entitled How to miscompile programs with “benign” data races which concluded that

C or C++ data races are always unsafe in portable code

and

Although “benign” races in Java are not clearly incorrect, they nonetheless put the programmer into an uncertain and dangerous territory that can be avoided by avoiding data races.

Boehm’s findings largely rest on the fact that C and C++ compilers may be compiling for targets other than x86, which do not, necessarily, have atomic writes.

Further Boehm’s paper points to a number of usecases where data races can occur because of adjacency, that is, the compiler is treating two variables as one because they will lie so close together in memory.

For these reasons, it’s important to be aware of more than just the “correctness” of the code that the given language is written in, but, also the code that is emitted for a given CPU architecture, and that CPU’s atomicity when performing writes (it’s writes that cause the problem, not reads).

But we have a “benign data race” example, in computing, that is used all day, every day - we even have a name for it “Eventual consistency” - the state of the data may or may not be reflected accurately by the reader, but, we account for that by saying that, given some time, the reader will accurately reflect the actual state of the data.

Eventual consistency exists because of the CAP theorem. Once data is partitioned, that is, more than one copy exists, or multiple compute instances (threads, etc) access that data, then the end result will be consistent, or available, but never both.

For this reason, we have to accept that benign data races are both possible, and, genuinely safe for some applications. The principle is the same, the data is being shared by multiple threads/processes, and a mutation of that data can occur at any time. The readers will eventually be showing that new state, and as long as that is recognised, then that is an acceptable outcome.

For non-technical readers, eventual consistency has existed in the real world forever. If I say that I am going to pay a bill, and it takes some time for that payment to show up, even though the bill has been paid, it will show, for some time, as unpaid (the cheque is in the mail! :-)

Published:
comments powered by Disqus