Fading Coder

One Final Commit for the Last Sprint

Understanding GPU Scheduling Architecture and CUDA Execution Mechanisms

Modern GPUs from various manufacturers share similar core architectures despite differences in hardware and software implementations. The NVIDIA ecosystem, being closed-source, presents challenges in understanding GPU scheduling strateiges. This analysis covers three key aspects: the CUDA programmin...

CUDA Thread Hierarchy: Organizing Grids, Blocks, and Threads for Parallel Execution

When a kernel is invoked from the host, the CUDA runtime instantiates a collection of threads on the device to execute the kernel code in parallel. These threads are arranged in a hierarchical structure that facilitates both scalability and cooperation: the grid, the thread block, and the individual...

Setting Up a VTK Development Environment with Backend Parallel Acceleration on Ubuntu Linux

VTK Parallel Acceleration Methods VTK (Visualization Toolkit) supports multiple parallel computing backends for performance optimization: Distributed Memory Parallelism: Enable VTK_USE_MPI during compilation to use vtkMultiProcessController for multi-node rendering and data partitioning. Shared Memo...