Linux Process Virtual Address Space: Mapping and Management
Memory Regions of a Process
When studying dynamic memory management in C/C++, we often categorize the address space into several logical regions: stack, heap, data segment (BSS, initialized data), and text (code). However, these addresses are not physical memory addresses. Consider the following example:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int global_counter = 100;
int main() {
pid_t pid = fork();
if (pid < 0) {
perror(\"fork failed\");
return 1;
} else if (pid == 0) {
int loop = 0;
while (1) {
if (loop == 3) {
global_counter = 200;
printf(\"Child changed global_counter...\\n\");
}
loop++;
printf(\"[Child] PID:%d, PPID:%d, value:%d, addr:%p\\n\", getpid(), getppid(), global_counter, &global_counter);
sleep(1);
}
} else {
while (1) {
printf(\"[Parent] PID:%d, PPID:%d, value:%d, addr:%p\\n\", getpid(), getppid(), global_counter, &global_counter);
sleep(1);
}
}
return 0;
}
After the child modifies global_counter, both parent and child print different values but the same address. This reveals that the observed address is not a physical address. A single physical memory location cannot simultaneously hold two distinct values for two processes.
In reality, the Linux kernel assigns each process its own virtual address space. A per‑process page table maps virtual addresses to physical frames. User‑space code only sees virtual addresses; the actual physical memory is accessed transparently through the mapping. When a process writes to a virtual address, the kernel locates the corresponding physical page via the page table and performs the modification there.
The above behaviour is explained by copy‑on‑write. Initially, the child’s address space (and page table) is a copy of the parent’s, so global_counter maps to the same physical page. When the child attempts to write, the kernel detects the shared mapping, allocates a new physical page, copies the original data, updates the child’s page table to point to the new page, and then writes the new value. Thus, although the virtual addresses are identical, they refer to different physical pages, providing process isolation.
In Linux, the terms virtual address, linear address, and logical address are often used interchangeably. For this discussion, they all mean the virtual address presented to user space.
Managing the Address Space: mm_struct
The OS must manage the address space of every process. Like process control blocks (task_struct), the kernel uses a data structure to describe the virtual memory layout. In Linux, this structure is mm_struct. Each process’s task_struct contains a pointer to its mm_struct, linking process management with memory management.
A simplified representation of how mm_struct defines regions:
struct mm_struct {
unsigned long code_start, code_end;
unsigned long data_start, data_end;
unsigned long heap_start, heap_end;
unsigned long stack_start, stack_end;
...
};
Each logical area (code, data, heap, stack) is delimited by a start and an end address within the virtual address space. Growth (e.g., brk for heap, automatic expansion for stack) is managed by adjusting these boundaries.
Why Virtual Address Spaces Exist
Directly accessing physical memory would expose the system to several problems. The virtual address space abstraction provides three major benefits:
- Protection and safety – The page table and adress space check every memory access. Illegal accesses (out‑of‑bounds, write to read‑only regions) can be trapped immediately, preventing a faulty process from corrupting other data.
- Process isolation and decoupling – Each process operates in its own virtual environment. Unrelated processes map to separate physical pages. For parent–child scenarios, copy‑on‑write ensures that modifications by one do not affect the other, preserving independence.
- Uniform view for the CPU and compiler – Processes see a consistent layout (text at low addresses, stack at high addresses, etc.) regardless of the actual fragmented physical allocation. Compilers generate code using this logical layout, and the CPU fetches instructions using virtual addresses. The page table translates every access, so the hardware operates seamlessly with the same abstraction.
Detailed Address Space Layout
The typical 32‑bit Linux process address space is partitioned as follows (total 4 GB):
- User space (0x00000000 – 0xBFFFFFFF, 3 GB) contains:
- Text (code)
- Initialized and uninitialized data (BSS)
- Heap (grows upward)
- Memory mappings (shared libraries,
mmapregion) - Stack (grows downward)
- Command‑line arguments and environment variables
- Kernel space (0xC0000000 – 0xFFFFFFFF, 1 GB) is reserved for the kernel and is shared among all processes (though user code cannot access it directly).
This strict division allows the kernel to reside in every process’s address space while keeping its data protected.