Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

ARM64 Translation Lookaside Buffer Architecture and Management

Tech 1

The Memory Management Unit (MMU) translates virtual addresses to physical addresses. To eliminate the latency of repeated page table walks in main memory, processors integrate a Translation Lookaside Buffer (TLB). This hardware cache stores recently accessed page table entries. Contemporary microarchitectures typically implement a split L1 TLB with separate instruction and data caches to allow concurrent fetch and load/store operations, backed by a unified L2 TLB that services both streams.

TLB Entry Composition

TLB entry layouts vary across instruction set architectures. In AArch64, a cached entry extends beyond simple address mapping. It retains memory type classifications, cacheability attributes, access control flags, an Address Space Identifier (ASID), and a Virtual Machine Identifier (VMID). The ASID isolates translations belonging to distinct user processes, whereas the VMID segregates address spaces across different guest operating systems in virtualized environments.

Invalidation Interfaces

Modifying page tables requires explicit invalidation of corresponding cached entries to maintain memory coherence. The kernel mandates architecture-specific routines to broadcast or locally clear stale mappings. Common architectural hooks include:

Kernel Hook Purpose
void invalidate_tlb_global(void); Purges every cached entry across all processor cores.
void invalidate_tlb_address_space(struct mm_struct *target_mm); Clears entries associated with a specific user memory descriptor.
void invalidate_tlb_virtual_range(struct vm_area_struct *region, unsigned long base, unsigned long limit); Evicts entries falling within a defined virtual memory boundary. base is inclusive, limit is exclusive.

AArch64 Invalidation Syntax

Hardware automatically populates the TLB during translation misses via automatic page walkers. Consequently, AArch64 provides no instructions for manual TLB writes. Software management relies exclusively on the TLBI (TLB Invalidate) instruction:

TLBI <operation><exception_level>{IS} [, <Xn>]

  1. Operation Scope: Defines the target entries. vmalle1 targets all stage-1 entries for the active VMID. vmalls12e1 covers both stage-1 and stage-2 translations for the current virtual machine.
  2. Exception Level: Specifies the privilege tier (E1, E2, or E3).
  3. Inner Shareable (IS): Dictates SMP visibility. Omitting IS restricts invalidation to the executing core. Appending IS broadcasts the operation across the inner shareable domain, ensuring multi-core coherence.
  4. Register Operand: An optional X0-X31 register providing address or ASID/VMID context for targeted invalidation.

Kernel Implementation Details

Global invalidation requires strict memory ordering to guarantee that prior page table updates are visible before the TLB purge, and that subsequent instructions fetch updated translations.

static inline void invalidate_tlb_global(void)
{
    /* Ensure all prior memory stores complete before invalidation */
    asm volatile("dsb ishst" ::: "memory");
    
    /* Broadcast invalidation for all EL1 stage-1 entries across shareable cores */
    asm volatile("tlbi vmalle1is");
    
    /* Wait for the TLB operation to finish globally */
    asm volatile("dsb ish" ::: "memory");
    
    /* Synchronize the instruction pipeline to fetch new translations */
    asm volatile("isb" ::: "memory");
}

For single-core operations, the shareable domain modifiers are removed, targeting only the local processor:

static inline void invalidate_tlb_local(void)
{
    asm volatile("dsb nshst" ::: "memory");
    asm volatile("tlbi vmalle1");
    asm volatile("dsb nsh" ::: "memory");
    asm volatile("isb" ::: "memory");
}

The architectural distinction lies in the ish (inner shareable) versus nsh (non-shareable) barrier domains and the is suffix on the tlbi operation.

Address Space Identifier (ASID) Management

Frequent context switches would normally mandate full TLB flushes, severely degrading performance. AArch64 mitigates this using the non-Global (nG) bit to separate kernel mappings from user mappings, and ASIDs to tag user-space translations. Hardware ASIDs are configurable at 8 or 16 bits. In an 8-bit configuration, only 255 value are available (0 is reserved). Since active processes often exceed this limit, the kernel implements a generation-based allocation scheme:

  1. Each task maintains a 64-bit software ASID. The lower bits hold the hardware tag, while the upper bits store a ganeration counter.
  2. A global 64-bit atomic variable tracks the current system generation.
  3. During scheduling, the kernel compares the task's generation against the global counter. A match allows reuse of the existing hardware ASID. A mismatch triggers reallocation.
  4. If free hardware slots exist, one is assigned. If the pool is exhausted, the global generation increments, and hardware allocation resets. Because old TLB entries might still hold the recycled hardware tag with a stale generation, a full broadcast TLB flush is mandatory at this rollover point.

This versioning strategy confines expensive global TLB purges to rare generation rollovers rather than every context switch.

Virtual Machine Identifier (VMID)

Virtualized environments apply an identical tagging principle using VMIDs. Each guest OS receives a unique VMID, allowing the TLB to cache translations from multiple virtual machines simultaneously without interference. Context switches between guests bypass TLB invalidation unless the VMID space wraps around, at which point a complete cache purge is executed to prevent translation leakage across guest boundaries.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.