Home > Tech > Content

Windows Installation Guide for VMamba with CUDA Acceleration

Tech Apr 23 17

Repository Setup

Begin by retrieving the official VMamba source code using Git:

git clone https://github.com/MzeroMiko/VMamba.git
cd VMamba

Environment Configuration

Establish a dedicated Conda environment. The following configuration utilizes Python 3.10 and CUDA 11.8, which ensures compatibility with the required torch versions.

conda create -n vmamba_win python=3.10
conda activate vmamba_win
conda install cudatoolkit==11.8
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install setuptools==68.2.2
conda install nvidia/label/cuda-11.8.0::cuda-nvcc_win-64
conda install packaging

For Triton support on Windows, install the specific wheel compatible with Python 3.10:

pip install triton-2.0.0-cp310-cp310-win_amd64.whl

Note that Triton functionality on Windows may be limited compared to Linux environments. The installation primarily satisfies dependency requirements for compilation.

Dependency Enstallation

Install the remaining Python dependencies listed in the requirements file, but pause before compiling the selective scan kernel:

pip install -r requirements.txt
cd kernels/selective_scan

Windows-Specific Code Modifications

To successfully compile the selective_scan kernel on Windows, specific source code adjustments are required to handle compiler differences.

Macro Adjustment

Modify the BOOL_SWITCH macro located in kernels/selective_scan/csrc/selective_scan/static_switch.h. Update the definition to enforce static constexpr behavior:

#define BOOL_SWITCH(COND, CONST_NAME, ...)                                           \
    [&] {                                                                            \
        if (COND) {                                                                  \
            static constexpr bool CONST_NAME = true;                                 \
            return __VA_ARGS__();                                                    \
        } else {                                                                     \
            static constexpr bool CONST_NAME = false;                                \
            return __VA_ARGS__();                                                    \
        }                                                                            \
    }()

Mathematical Constant Definition

Several CUDA kernel files require an explicit definition for M_LOG2E as it is not always available in the Windows MSVC compiler environment. Add the following guard to the top of these files:

kernels/selective_scan/csrc/selective_scan/cus/selective_scan_bwd_kernel.cuh
kernels/selective_scan/csrc/selective_scan/cus/selective_scan_fwd_kernel.cuh
kernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_bwd_kernel_oflex.cuh
kernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_fwd_kernel_oflex.cuh

Insert this code block at the beginning of each file:

#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074
#endif

Compilation Process

Once the modifications are complete, proceed with the installation:

pip install .

If specific modules are required, adjust the build configuration in setup.py. Locate the mode selection variable and update it to include the core implementation:

# Original
# MODES = ["oflex"]

# Modified
TARGET_MODES = ["core", "oflex"]

Troubleshooting Common Errors

Missing CUDA Headers

Errors indicating missing files like cuda_runtime.h, cusparse.h, or cublas_v2.h suggest incomplete CUDA toolkit installations within the environment. Resolve this by installing the specific development packages via Conda:

conda install nvidia/label/cuda-11.8.0::cuda-cudart-dev
conda install nvidia/label/cuda-11.8.0::libcusparse-dev

Visual Studio Compatibility

If encountering fatal error C1189 regarding unsupported Microsoft Visual Studio versions, locate the host_config.h file within the CUDA include directory. Adjust the version check macros to accommodate the installed compiler version:

// Example adjustment in host_config.h
#if _MSC_VER < 1910 || _MSC_VER > 2929
#error -- unsupported Microsoft Visual Studio version!
#endif

Triton Runtime Issues

Windows environments often struggle with Triton's runtime compilation, leading to errors such as RuntimeError: Failed to find C compiler. To bypasss this, disable Triton usage in the model definition. Open VMamba/classification/models/csm_triton.py and modify the configuration flag:

# Change from True to False
TRITON_ENABLED = False

DLL Load Failures

If ImportError: DLL load failed occurs when importing selective scan modules, verify that the Python, Torch, and CUDA versions match the compiled binaries exactly. Mismatched versions frequently cause dynamic library loading failures on Windows. Ensure the CUDA_HOME environment variable points to the correct toolkit path used during compilation:

python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"

Adjust system environment variables if the detected path does not align with the active Conda environment.

Tags: VMamba windows

Back to List

Prev: Building a Basic HTTP Server with Node.js

Next: Number Bases Explained: Real-World Examples and Practical Algorithms

Fading Coder

Windows Installation Guide for VMamba with CUDA Acceleration

Repository Setup

Environment Configuration

Dependency Enstallation

Windows-Specific Code Modifications

Macro Adjustment

Mathematical Constant Definition

Compilation Process

Troubleshooting Common Errors

Missing CUDA Headers

Visual Studio Compatibility

Triton Runtime Issues

DLL Load Failures

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Windows Installation Guide for VMamba with CUDA Acceleration

Repository Setup

Environment Configuration

Dependency Enstallation

Windows-Specific Code Modifications

Macro Adjustment

Mathematical Constant Definition

Compilation Process

Troubleshooting Common Errors

Missing CUDA Headers

Visual Studio Compatibility

Triton Runtime Issues

DLL Load Failures

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment