Windows Installation Guide for VMamba with CUDA Acceleration
Repository Setup
Begin by retrieving the official VMamba source code using Git:
git clone https://github.com/MzeroMiko/VMamba.git
cd VMamba
Environment Configuration
Establish a dedicated Conda environment. The following configuration utilizes Python 3.10 and CUDA 11.8, which ensures compatibility with the required torch versions.
conda create -n vmamba_win python=3.10
conda activate vmamba_win
conda install cudatoolkit==11.8
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install setuptools==68.2.2
conda install nvidia/label/cuda-11.8.0::cuda-nvcc_win-64
conda install packaging
For Triton support on Windows, install the specific wheel compatible with Python 3.10:
pip install triton-2.0.0-cp310-cp310-win_amd64.whl
Note that Triton functionality on Windows may be limited compared to Linux environments. The installation primarily satisfies dependency requirements for compilation.
Dependency Enstallation
Install the remaining Python dependencies listed in the requirements file, but pause before compiling the selective scan kernel:
pip install -r requirements.txt
cd kernels/selective_scan
Windows-Specific Code Modifications
To successfully compile the selective_scan kernel on Windows, specific source code adjustments are required to handle compiler differences.
Macro Adjustment
Modify the BOOL_SWITCH macro located in kernels/selective_scan/csrc/selective_scan/static_switch.h. Update the definition to enforce static constexpr behavior:
#define BOOL_SWITCH(COND, CONST_NAME, ...) \
[&] { \
if (COND) { \
static constexpr bool CONST_NAME = true; \
return __VA_ARGS__(); \
} else { \
static constexpr bool CONST_NAME = false; \
return __VA_ARGS__(); \
} \
}()
Mathematical Constant Definition
Several CUDA kernel files require an explicit definition for M_LOG2E as it is not always available in the Windows MSVC compiler environment. Add the following guard to the top of these files:
kernels/selective_scan/csrc/selective_scan/cus/selective_scan_bwd_kernel.cuhkernels/selective_scan/csrc/selective_scan/cus/selective_scan_fwd_kernel.cuhkernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_bwd_kernel_oflex.cuhkernels/selective_scan/csrc/selective_scan/cusoflex/selective_scan_fwd_kernel_oflex.cuh
Insert this code block at the beginning of each file:
#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074
#endif
Compilation Process
Once the modifications are complete, proceed with the installation:
pip install .
If specific modules are required, adjust the build configuration in setup.py. Locate the mode selection variable and update it to include the core implementation:
# Original
# MODES = ["oflex"]
# Modified
TARGET_MODES = ["core", "oflex"]
Troubleshooting Common Errors
Missing CUDA Headers
Errors indicating missing files like cuda_runtime.h, cusparse.h, or cublas_v2.h suggest incomplete CUDA toolkit installations within the environment. Resolve this by installing the specific development packages via Conda:
conda install nvidia/label/cuda-11.8.0::cuda-cudart-dev
conda install nvidia/label/cuda-11.8.0::libcusparse-dev
Visual Studio Compatibility
If encountering fatal error C1189 regarding unsupported Microsoft Visual Studio versions, locate the host_config.h file within the CUDA include directory. Adjust the version check macros to accommodate the installed compiler version:
// Example adjustment in host_config.h
#if _MSC_VER < 1910 || _MSC_VER > 2929
#error -- unsupported Microsoft Visual Studio version!
#endif
Triton Runtime Issues
Windows environments often struggle with Triton's runtime compilation, leading to errors such as RuntimeError: Failed to find C compiler. To bypasss this, disable Triton usage in the model definition. Open VMamba/classification/models/csm_triton.py and modify the configuration flag:
# Change from True to False
TRITON_ENABLED = False
DLL Load Failures
If ImportError: DLL load failed occurs when importing selective scan modules, verify that the Python, Torch, and CUDA versions match the compiled binaries exactly. Mismatched versions frequently cause dynamic library loading failures on Windows. Ensure the CUDA_HOME environment variable points to the correct toolkit path used during compilation:
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"
Adjust system environment variables if the detected path does not align with the active Conda environment.