Go Memory Management Fundamentals
Introduction
After understanding how operating systems manage memory, we can now explore how Go leverages these underlying mechanisms to optimize memory usage. Go's memory management is largely inspired by tcmalloc, with minor adjustments tailored to its specific requirements.
Go handles memory automatically, allowing developers to define variables and use them without worrying about allocation or deallocation. This article aims to clarify what Go does behind the scenes to simplify memory handling while maintaining high performance.
This article focuses solely on Go’s memory management model. Related topics like escape analysis and garbage collection will be covered separately due to space constraints.
Caching: Trading Space for Time
Dynamic memory allocation requires system calls, such as mmap on Linux. For large-scale services, frequent use of mmap incurs overhead:
- System calls switch the program from user mode to kernel mode, which introduces latency.
- Allocating small chunks repeatedly can lead to fragmentation, increasing OS cleanup costs.
- Ensuring good locality of reference demands significant optimization effort from developers.
The solution is to implement a resource pool (also known as caching).
Consider a scenario where a service frequently allocates memory for arrays like [10]int. Instead of requesting memory each time, pre-allocate hundreds or thousands of such arrays during startup. This approach resolves the issues above:
- Eliminates repeated system calls, avoiding kernel transitions.
- Pre-allocated blocks reduce fragmentation through reuse.
- Frequent accesses occur within a limited memory region, enhancing locality.
While some memory may go unused, pools can be monitored and resized appropriately.
Similar strategies apply to connection pools, memory pools, and other resource-intensive operations.
Go's Memory Management Architecture
Go’s memory management operates as a memory pool, enhanced with optimizations like dynamic resizing and efficient block division.
Three-Level Memory Cache: mheap
Upon startup, Go acquires a large chunk of memory from the OS, managed by a structure called mheap. This structure divides the memory into regions and allocates segments based on object sizes.
Key concepts:
page: An 8KB unit used for all allocations and deallocations between Go and the OS.span: A contiguous sequence of one or more pages. Think of pages as workers and spans as teams.sizeclass: Defines how a span's pages should be divided into equal-sized objects.object: A fixed-size memory block allocated for storing data. A span is divided into multiple equal-sized objects.
Example: If an object is 16 bytes and a span is 8KB, it results in 512 objects per span.
The initial heap size is approximately 64MB in Go 1.11.5.
package main
import "runtime"
var stat runtime.MemStats
func main() {
runtime.ReadMemStats(&stat)
println(stat.HeapSys)
}
The internal layout includes:
mheap.spans: Tracks page-to-span mappings.mheap.bitmap: Marks objects for garbage collection.mheap.arena_start: Virtual address range for application use.
Note: These are virtual addresses reserved by the OS, not physical memory.
Two-Level Cache: mcentral
Spans of the same sizeclass are linked together. There are 67 sizeclasses in Go 1.5 (subject to change), each representing a specific object size.
When allocating memory, the system determines the appropriate sizeclass and retrieves a span from mcentral. If no suitable spans exist, new ones are carved from mheap.freelarge or requested from the OS.
type mheap struct {
lock mutex
free [_MaxMHeapList]mspan
freelarge mspan
busy [_MaxMHeapList]mspan
busylarge mspan
central [_NumSizeClasses]struct {
mcentral mcentral
}
}
type mcentral struct {
lock mutex
sizeclass int32
nonempty mspan
empty mspan
}
This avoids external fragmentation by using fixed-size allocations and deallocations.
One-Level Cache: mcache
To avoid contention in concurrent scenarios, each processor (P) has its own mcache. Goroutines allocate from their P’s mcache first; if unavailable, it fetches from mcentral.
This eliminates locking during allocation since only one thread operates on a P at a time, improving performance.
Additional Optimizations
Zero-Sized Objects
Zero-sized objects such as struct{} or [0]int do not require actual memory. Go returns a predefined address instead.
func mallocgc(size uintptr, typ *_type, flags uint32) unsafe.Pointer {
if size == 0 {
return unsafe.Pointer(&zerobase)
}
// ...
}
Test result shows identical addresses for zero-sized variables.
Tiny Objects
Objects smaller than 16 bytes are treated as tiny objects. They share a single 16-byte object from sizeclass=2, reusing space until full.
This improves utilization from 45.83% to 68.75% in some cases.
Large Objects
Objects exceeding 32KB bypass mcache and mcentral, directly using mheap.freelarge.
Summary
Memory release follows the reverse process: freed spans return to mcentral, then to mheap, and finally to the OS.
Advantages:
- Most allocations happen in user mode, reducing kernel switches.
- Each P has isolated caches, improving CPU cache hit rates.
- Internal fragmentation is minimized through user-level management.
- Lock-free allocation via
mcacheenhances concurrency.
Trade-offs:
Pre-allocation uses more memory, but modern RAM makes this negligible.
The design mirrors common layered architectures like client-server-db stacks, optimizing resource access based on data hotness.
Key performance bottlenecks:
- Frequent large allocations – consider custom pools for byte slices.
- Overuse of pointers – impacts both memory usage and GC efficiency.
Memory Fragmentation
Fragmentation occurs when memory becomes unusable due to allocation pattenrs:
- Internal Fragmentation: Waste from alignment padding (e.g., 28B request gets 32B).
- External Fragmentation: Small gaps left after freeing, preventing reuse.
Go's model minimizes etxernal fragmentation through internal management.
Runtime.MemStats Explained
type MemStats struct {
Alloc uint64 // Currently allocated bytes
TotalAlloc uint64 // Cumulative allocations
Sys uint64 // Total OS memory requested
Mallocs uint64 // Total allocations
Frees uint64 // Total deallocations
HeapAlloc uint64 // Bytes in heap
HeapSys uint64 // Bytes requested from OS
HeapIdle uint64 // Unused heap bytes
HeapInuse uint64 // Used heap bytes
HeapReleased uint64 // Returned to OS
HeapObjects uint64 // Live objects
NextGC uint64 // Target heap size for next GC
LastGC uint64 // Last GC timestamp
PauseTotalNs uint64 // Total GC pause time
PauseNs [256]uint64 // Recent GC pauses
PauseEnd [256]uint64 // GC end times
NumGC uint32 // Number of GC cycles
NumForcedGC uint32 // Forced GC count
BySize [61]struct {
Size uint32 // Object size
Mallocs uint64 // Allocated objects
Frees uint64 // Freed objects
}
}