Home > Tech > Content

Go Memory Management Fundamentals

Tech May 7 3

Introduction

After understanding how operating systems manage memory, we can now explore how Go leverages these underlying mechanisms to optimize memory usage. Go's memory management is largely inspired by tcmalloc, with minor adjustments tailored to its specific requirements.

Go handles memory automatically, allowing developers to define variables and use them without worrying about allocation or deallocation. This article aims to clarify what Go does behind the scenes to simplify memory handling while maintaining high performance.

This article focuses solely on Go’s memory management model. Related topics like escape analysis and garbage collection will be covered separately due to space constraints.

Caching: Trading Space for Time

Dynamic memory allocation requires system calls, such as mmap on Linux. For large-scale services, frequent use of mmap incurs overhead:

System calls switch the program from user mode to kernel mode, which introduces latency.
Allocating small chunks repeatedly can lead to fragmentation, increasing OS cleanup costs.
Ensuring good locality of reference demands significant optimization effort from developers.

The solution is to implement a resource pool (also known as caching).

Consider a scenario where a service frequently allocates memory for arrays like [10]int. Instead of requesting memory each time, pre-allocate hundreds or thousands of such arrays during startup. This approach resolves the issues above:

Eliminates repeated system calls, avoiding kernel transitions.
Pre-allocated blocks reduce fragmentation through reuse.
Frequent accesses occur within a limited memory region, enhancing locality.

While some memory may go unused, pools can be monitored and resized appropriately.

Similar strategies apply to connection pools, memory pools, and other resource-intensive operations.

Go's Memory Management Architecture

Go’s memory management operates as a memory pool, enhanced with optimizations like dynamic resizing and efficient block division.

Three-Level Memory Cache: mheap

Upon startup, Go acquires a large chunk of memory from the OS, managed by a structure called mheap. This structure divides the memory into regions and allocates segments based on object sizes.

Key concepts:

page: An 8KB unit used for all allocations and deallocations between Go and the OS.
span: A contiguous sequence of one or more pages. Think of pages as workers and spans as teams.
sizeclass: Defines how a span's pages should be divided into equal-sized objects.
object: A fixed-size memory block allocated for storing data. A span is divided into multiple equal-sized objects.

Example: If an object is 16 bytes and a span is 8KB, it results in 512 objects per span.

The initial heap size is approximately 64MB in Go 1.11.5.

package main
import "runtime"
var stat runtime.MemStats
func main() {
    runtime.ReadMemStats(&stat)
    println(stat.HeapSys)
}

The internal layout includes:

mheap.spans: Tracks page-to-span mappings.
mheap.bitmap: Marks objects for garbage collection.
mheap.arena_start: Virtual address range for application use.

Note: These are virtual addresses reserved by the OS, not physical memory.

Two-Level Cache: mcentral

Spans of the same sizeclass are linked together. There are 67 sizeclasses in Go 1.5 (subject to change), each representing a specific object size.

When allocating memory, the system determines the appropriate sizeclass and retrieves a span from mcentral. If no suitable spans exist, new ones are carved from mheap.freelarge or requested from the OS.

type mheap struct {
    lock      mutex
    free      [_MaxMHeapList]mspan
    freelarge mspan
    busy      [_MaxMHeapList]mspan
    busylarge mspan
    central [_NumSizeClasses]struct {
        mcentral mcentral
    }
}

type mcentral struct {
    lock      mutex
    sizeclass int32
    nonempty  mspan
    empty     mspan
}

This avoids external fragmentation by using fixed-size allocations and deallocations.

One-Level Cache: mcache

To avoid contention in concurrent scenarios, each processor (P) has its own mcache. Goroutines allocate from their P’s mcache first; if unavailable, it fetches from mcentral.

This eliminates locking during allocation since only one thread operates on a P at a time, improving performance.

Additional Optimizations

Zero-Sized Objects

Zero-sized objects such as struct{} or [0]int do not require actual memory. Go returns a predefined address instead.

func mallocgc(size uintptr, typ *_type, flags uint32) unsafe.Pointer {
    if size == 0 {
        return unsafe.Pointer(&zerobase)
    }
    // ...
}

Test result shows identical addresses for zero-sized variables.

Tiny Objects

Objects smaller than 16 bytes are treated as tiny objects. They share a single 16-byte object from sizeclass=2, reusing space until full.

This improves utilization from 45.83% to 68.75% in some cases.

Large Objects

Objects exceeding 32KB bypass mcache and mcentral, directly using mheap.freelarge.

Summary

Memory release follows the reverse process: freed spans return to mcentral, then to mheap, and finally to the OS.

Advantages:

Most allocations happen in user mode, reducing kernel switches.
Each P has isolated caches, improving CPU cache hit rates.
Internal fragmentation is minimized through user-level management.
Lock-free allocation via mcache enhances concurrency.

Trade-offs:

Pre-allocation uses more memory, but modern RAM makes this negligible.

The design mirrors common layered architectures like client-server-db stacks, optimizing resource access based on data hotness.

Key performance bottlenecks:

Frequent large allocations – consider custom pools for byte slices.
Overuse of pointers – impacts both memory usage and GC efficiency.

Memory Fragmentation

Fragmentation occurs when memory becomes unusable due to allocation pattenrs:

Internal Fragmentation: Waste from alignment padding (e.g., 28B request gets 32B).
External Fragmentation: Small gaps left after freeing, preventing reuse.

Go's model minimizes etxernal fragmentation through internal management.

Runtime.MemStats Explained

type MemStats struct {
    Alloc        uint64 // Currently allocated bytes
    TotalAlloc   uint64 // Cumulative allocations
    Sys          uint64 // Total OS memory requested
    Mallocs      uint64 // Total allocations
    Frees        uint64 // Total deallocations
    HeapAlloc    uint64 // Bytes in heap
    HeapSys      uint64 // Bytes requested from OS
    HeapIdle     uint64 // Unused heap bytes
    HeapInuse    uint64 // Used heap bytes
    HeapReleased uint64 // Returned to OS
    HeapObjects  uint64 // Live objects
    NextGC       uint64 // Target heap size for next GC
    LastGC       uint64 // Last GC timestamp
    PauseTotalNs uint64 // Total GC pause time
    PauseNs      [256]uint64 // Recent GC pauses
    PauseEnd     [256]uint64 // GC end times
    NumGC        uint32 // Number of GC cycles
    NumForcedGC  uint32 // Forced GC count
    BySize       [61]struct {
        Size    uint32 // Object size
        Mallocs uint64 // Allocated objects
        Frees   uint64 // Freed objects
    }
}

Tags: go memory-management garbage-collection Performance internals

Back to List

Prev: Understanding Performance Testing: From Basic Definitions to a Complete Execution Framework

Next: Deploying a Static Blog with Hexo and GitHub Pages

Fading Coder

Go Memory Management Fundamentals

Introduction

Caching: Trading Space for Time

Go's Memory Management Architecture

Three-Level Memory Cache: mheap

Two-Level Cache: mcentral

One-Level Cache: mcache

Additional Optimizations

Zero-Sized Objects

Tiny Objects

Large Objects

Summary

Memory Fragmentation

Runtime.MemStats Explained

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Go Memory Management Fundamentals

Introduction

Caching: Trading Space for Time

Go's Memory Management Architecture

Three-Level Memory Cache: mheap

Two-Level Cache: mcentral

One-Level Cache: mcache

Additional Optimizations

Zero-Sized Objects

Tiny Objects

Large Objects

Summary

Memory Fragmentation

Runtime.MemStats Explained

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment