Unit testing code is validity testing. The developer is ensuring that the function being tested deals with the supplied input in an expected way, either producing output, or returning error information, that is considered appropriate.
Benchmark testing code is performance testing. The developer is discovering what the runtime for their method is on a given architecture.
Benchmarking in Go is incredibly simple (as it should be!), but the devil lies in the details, as always.
For a simple function
package foo
func Adder(a, b int) int {
return a+b
}
A simple benchmark test might be
package foo
import "testing"
func BenchmarkAdder9b *testing.B){
for i := 0; i <b.N; i++ {
Adder(i, i)
}
}
Which will produce information like
$ go test -bench .
goos: linux
goarch: amd64
pkg: github.com/shanehowearth/foo
cpu: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
BenchmarkAdder-4 1000000000 0.2826 ns/op
PASS
ok github.com/shanehowearth/foo 0.319s
Some things to note
- the benchmark test doesn’t care about the output of the function, these are not validity tests.
- the output of the test provides OS, architecture, and CPU information, which is vital for allowing like for like comparisons to take place.
- the number of iterations that were run (1000000000)
- the length of time per iteration (0.2826 ns/op)
- the length of time it took to run all the tests (0.319s)
This shows the bare minimum though, there are multiple facets of the benchmark tests to investigate.
Inputs
The first thing that stands out, for me at least, is the input provided to the function being tested. In the above case i
has been provided to the Adder
function twice. The question that has to be asked is, “Is the input provided representative of normal use”, that is, in normal everyday usage of the function, what would it be fair to say the input will be for a
and b
.
A paper “Selecting Representative Benchmark Inputs for Exploring Microprocessor Design Spaces” [pdf] provides an excellent argument on using randomised input to aid the discovery of optimal design of a given function within a given problem set. (Put simply, use randomised input to give a more representative sample of the likely data your function will face in the wild).
An amended benchmark test that produces randomised input might look like
package foo
import (
"math/rand"
"testing"
"time"
)
func BenchmarkAdder(b *testing.B) {
rand.Seed(time.Now().UTC().UnixNano())
for i := 0; i <b.N; i++ {
b.StopTimer()
// Create random input values
alpha := rand.Int()
beta := rand.Int()
b.StartTimer()
Adder(alpha, beta)
}
}
In this piece of code the input passed to Adder()
is randomised, with the timer stopped and started each loop so as to avoid the cost of the generation of the randomised values from polluting the test.
When running this test, the time taken is noticeably longer, stopping, starting the timer, and generating randomised values isn’t free.
On my local machine, the first example took less than half a second to execute, the second one a few minutes (525.365s, or 8 minutes, and 45 seconds), whilst running only one thirtieth of the iterations.
$ go test -bench .
goos: linux
goarch: amd64
pkg: github.com/shanehowearth/foo
cpu: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
BenchmarkAdder-4 33319171 35.96 ns/op
PASS
ok github.com/shanehowearth/foo 525.365s
Test iterations
The Go test
tool has a number of arguments that it will accept that will help create “like with like” tests, the first is “-benchtime=nx”, the x signifies that the value is a “Number of iterations the test must run”
eg.
$ go test -bench . -benchtime 1000000000x
Where n is the number of iterations that the test must run. Note, the -benchtime
flag was originally, and still is, used to tell Go test
how long to run tests for, that is, a time can be supplied instead.
eg.
$ go test -bench . -benchtime 10m
However setting a large number of iterations like this may trigger a timeout. That is the Go test
tool will end testing, if the tests take longer than 11 minutes, it will stop running the test and panic.
Timeout
In order to run tests that take longer a command line argument that can be supplied that sets the timeout, for example 20 minutes
$ go test -bench . -timeout 20m -benchtime 1000000000x
Or, the test can be set to never timeout
$ go test -bench . -timeout 0 -benchtime 1000000000x
Memory benchmarks
Thus far the benchmark test output has been focused on the time that a given function takes to execute, but often there is another dimension of cost that needs to be checked, how much memory is being used. -benchmem
prints the “memory allocation statistics”
Using the first benchmark example with the benchmem switch gives
$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/shanehowearth/foo
cpu: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
BenchmarkAdder-4 1000000000 0.2856 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/shanehowearth/foo 0.319s
Obviously the example function requires no memory, and therefore there is no allocations per op.
For completeness, the B/op
metric means “Number of Bytes used per test iteration” and allocs/op
is the “Allocations of memory”, both on the heap. There is no way to start and stop this measurement like there is with the timer, with the exception of the ResetTimer(), which zeroes both the timer and memory metrics. So building a tree for a search function test will add to the memory ‘cost’ of the function, even though no actual memory may be used. In the search example, the information stored in the function might be a search key, and the pointer to the leaf that the search function is currently inspecting, although retrieving the information held at the pointer would be considered legitimate memory cost for the function.
Notes
The benchmark metrics are held in a struct named B with Memstats holding memory statistics.
In order to run a single benchmark test use the following magic
$ go test -bench Deletion -benchtime 100000x -run
notgoingtomatch -benchmem
Where Deletion is the name of the Benchmark to run. Using -run notgoingtomatch
excludes other tests from being run, because none will match that pattern (astute readers might see “notgoingtomatch” as far too much to type, because other patterns can be used, eg. ‘ ’ or ‘\b\B’ or ^$, because those patterns are impossible to match, as tests cannot be named ‘ ‘, nor can they match the regular expression ‘\b\B’, but the long form is clearer to the reader.
For reference see the authoritative documentation.