Goroutines are touted for being lightweight and cheap, in terms of memory usage. Let’s take a look at the argument and see if it withstands scrutiny.
Before we look at Goroutine memory usage itself, we need to understand how computer applications use memory in the first place.
The format that this post will focus on is ELF, rather than the PE or Mach-O formats.
This first section isn’t Go specific.
When an application is started, it is first given an address space, that is, a section of memory that it can use. The application is loaded into that address space, with parts of the address space reserved for different things.
A block (from the start address) called the text
area, holds the executable binary. This is the running application.
The next block is the .bss
, or rodata
section; this section is populated with constants. When you declare a variable in your code as constant
this is where it will live.
The final block is data
. This section is used by two different processes, the heap
, whose pointer starts at the end of thetext
section, and the stack
, whose pointer starts at the end of the address space that the application was allocated, working its way back, toward the heap
pointer that’s working its way toward the stack
.
The heap
is shared memory, that is, data stored on the heap
is accessible by one or more functions.
The stack
is the function stack for the main function.
As the application proceeds in the execution of its tasks it uses memory, function calls require stacks to be created for each function (which is why recursion will cause stack overflows), variables that escape from function stacks are stored on the heap, and so on.
There is one more interesting memory consumer to be aware of. Threads. When a thread is created, it uses memory within the application’s assigned address space. But it’s very much like the main function stack, with some special caveats.
When the thread is assigned memory, it is given enough memory to last for its lifetime (which is a lot), and that memory is ‘protected’ from other threads (including the main thread) with Guard pages, that is, two Read Only pages of memory (one for each ‘end’ of the thread’s assigned memory block).
A thread is, therefore, very expensive, in terms of memory usage. Assigning enough memory to last the thread for its presumed lifetime means memory is going to waste. The guard pages is more memory that is unavailable to the main process to use. In practice this means that POSIX threads are allocated between 1 and 8 MB of memory!
Goroutines don’t suffer this expense.
A Goroutine starts with a 2048 Byte sized stack. That size can shrink, or grow infinitely (theoretically, in practice there are architecture based limits written into the code). There is guard memory, but some of it is available to the stack (in special circumstances). There is no need to pre-allocate memory for the Goroutine’s stack to be able to use because the stack can be split, thus allowing stacks from different Goroutines to interweave with one another (similar to the heap).
Rather than pre-allocate memory for the function stack that the Goroutine calls, function stacks are created lazily. That is, the memory required for the Goroutine is only allocated when the function is called, and only if the function is going to require more memory in the stack than the Goroutine already has free in its stack.
This is the same in the ‘main’ thread, there’s no need to allocate memory for a function stack, until you need to. However Goroutines have one more fancy trick up their sleeves. They scavenge the memory that a function stack was using, when that function exits. This means that new function stacks aren’t slowed down by requests for memory for this Goroutine stack.
Further reading: T Paschalis’ Goroutine’s size.
Dave Cheney’s Why is a Goroutine’s stack infinite.