Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Windows Installation Guide for VMamba with CUDA Acceleration

Tech 1

Repository Setup

Begin by retrieving the official VMamba source code using Git:

git clone https://github.com/MzeroMiko/VMamba.git
cd VMamba

Environment Configuration

Establish a dedicated Conda environment. The following configuration utilizes Python 3.10 and CUDA 11.8, which ensures compatibility with the required torch versions.

conda create -n vmamba_win python=3.10
conda activate vmamba_win
conda install cudatoolkit==11.8
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install setuptools==68.2.2
conda install nvidia/label/cuda-11.8.0::cuda-nvcc_win-64
conda install packaging

For Triton support on Windows, install the specific wheel compatible with Python 3.10:

pip install triton-2.0.0-cp310-cp310-win_amd64.whl

Note that Triton functionality on Windows may be limited compared to Linux environments. The installation primarily satisfies dependency requirements for compilation.

Dependency Enstallation

Install the remaining Python dependencies listed in the requirements file, but pause before compiling the selective scan kernel:

pip install -r requirements.txt
cd kernels/selective_scan

Windows-Specific Code Modifications

To successfully compile the selective_scan kernel on Windows, specific source code adjustments are required to handle compiler differences.

Macro Adjustment

Modify the BOOL_SWITCH macro located in kernels/selective_scan/csrc/selective_scan/static_switch.h. Update the definition to enforce static constexpr behavior:

#define BOOL_SWITCH(COND, CONST_NAME, ...)                                           \
    [&] {                                                                            \
        if (COND) {                                                                  \
            static constexpr bool CONST_NAME = true;                                 \
            return __VA_ARGS__();                                                    \
        } else {                                                                     \
            static constexpr bool CONST_NAME = false;                                \
            return __VA_ARGS__();                                                    \
        }                                                                            \
    }()

Mathematical Constant Definition

Several CUDA kernel files require an explicit definition for M_LOG2E as it is not always available in the Windows MSVC compiler environment. Add the following guard to the top of these files:

  • kernels/selective_scan/csrc/selective_scan/cus/selective_scan_bwd_kernel.cuh
  • kernels/selective_scan/csrc/selective_scan/cus/selective_scan_fwd_kernel.cuh
  • kernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_bwd_kernel_oflex.cuh
  • kernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_fwd_kernel_oflex.cuh

Insert this code block at the beginning of each file:

#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074
#endif

Compilation Process

Once the modifications are complete, proceed with the installation:

pip install .

If specific modules are required, adjust the build configuration in setup.py. Locate the mode selection variable and update it to include the core implementation:

# Original
# MODES = ["oflex"]

# Modified
TARGET_MODES = ["core", "oflex"]

Troubleshooting Common Errors

Missing CUDA Headers

Errors indicating missing files like cuda_runtime.h, cusparse.h, or cublas_v2.h suggest incomplete CUDA toolkit installations within the environment. Resolve this by installing the specific development packages via Conda:

conda install nvidia/label/cuda-11.8.0::cuda-cudart-dev
conda install nvidia/label/cuda-11.8.0::libcusparse-dev

Visual Studio Compatibility

If encountering fatal error C1189 regarding unsupported Microsoft Visual Studio versions, locate the host_config.h file within the CUDA include directory. Adjust the version check macros to accommodate the installed compiler version:

// Example adjustment in host_config.h
#if _MSC_VER < 1910 || _MSC_VER > 2929
#error -- unsupported Microsoft Visual Studio version!
#endif

Triton Runtime Issues

Windows environments often struggle with Triton's runtime compilation, leading to errors such as RuntimeError: Failed to find C compiler. To bypasss this, disable Triton usage in the model definition. Open VMamba/classification/models/csm_triton.py and modify the configuration flag:

# Change from True to False
TRITON_ENABLED = False

DLL Load Failures

If ImportError: DLL load failed occurs when importing selective scan modules, verify that the Python, Torch, and CUDA versions match the compiled binaries exactly. Mismatched versions frequently cause dynamic library loading failures on Windows. Ensure the CUDA_HOME environment variable points to the correct toolkit path used during compilation:

python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"

Adjust system environment variables if the detected path does not align with the active Conda environment.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.