Building PyTorch with CUDA Support for Legacy GPUs on Windows
PyTorch binaries after 1.3 dropped support for GPUs with compute capability 3.5 and below, and by 1.7 the prebuilt wheels target compute capability 5.2 or higher. If you have an older GPU (for example, a Kepler device like GT 730M with CC 3.5) and still want GPU acceleration, you can compile PyTorch from source on Windows and explicitly target your GPU’s architecture.
The guide below uses PyTorch 1.7.x with CUDA 10.1 as an example, because that toolchain aligns well with older hardware and drivers.
1. Toolchain and Dependencies
- Visual Studio 2019 (Desktop development with C++)
- NVIDIA CUDA Toolkit 10.1 (e.g., 10.1.105)
- cuDNN for CUDA 10.1 (e.g., 7.6.4)
- Intel MKL package
- MAGMA matching your CUDA version (e.g., 2.5.4 for CUDA 10.1)
- sccache (optional but recommended for faster rebuilds)
- Ninja build system
- Python 3.7–3.8 with pip or conda
1.1 Visual Studio 2019
Install Visual Studio 2019 and include "Desktop development with C++". Older PyTorch releases often work best with a mid-2019 MSVC toolset. If you have issues with the latest VS components, install an earlier minor release from Microsoft’s versioned installers.
1.2 CUDA Toolkit 10.1
- Download from NVIDIA’s CUDA Toolkit archive.
- Install with Visual Studio integration and NVCC.
- After installation, verify
nvcc --versionin a Developer Command Prompt.
1.3 cuDNN 7.6.x for CUDA 10.1
- Download cuDNN for CUDA 10.1 (e.g., 7.6.4) from NVIDIA.
- Unzip and place
bin,include, andlibunder a dedicated folder (e.g.,C:\toolkits\cudnn-10.1). - Alternatively, you can copy the contents into the CUDA 10.1 installation directory to simplify include/lib discovery.
1.4 Intel MKL
- Obtain MKL (e.g.,
mkl_2020.0.166.7z) from the referenced artifacts used by PyTorch CI. - Extract to a fixed location, such as
D:\deps\mkl.
1.5 MAGMA
- Use a MAGMA build that matches your CUDA version and MSVC ABI. For CUDA 10.1, a typical package is
magma_2.5.4_cuda101_release.7z. - Keep separate directories for Release and Debug variants if you plan to build both.
- Extract to, for example,
D:\deps\magma.
1.6 sccache (optional)
- Download
sccache.exeandsccache-cl.exeused in PyTorch’s Windows builds. - Put them in a folder like
D:\deps\sccacheand add to PATH.
1.7 Ninja
- Download
ninja-win.zipfrom the Ninja releases page. - Extract
ninja.exetoD:\deps\ninjaand add to PATH.
1.8 Python
- Use either Anaconda/Miniconda or a plain Python installation. Both work. Ensure your Python matches PyTorch 1.7’s supported versions (3.7–3.8 are safe choices).
1.9 Python packages
Install build-time Python dependencies:
pip install --upgrade pip wheel
pip install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
2. Environment Setup (Batch Script)
Create a batch file (for example, set_env.bat) to standardize environment variables. Adjust paths to match your system.
@echo off
setlocal enableextensions enabledelayedexpansion
rem ===== Build knobs =====
set BUILD_FLAVOR=Release
set USE_CUDA=1
set USE_DISTRIBUTED=0
set CMAKE_GENERATOR=Ninja
rem ===== Paths to toolchains and libs =====
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDNN_DIR=C:\toolkits\cudnn-10.1
set MKL_ROOT=D:\deps\mkl
set MAGMA_HOME=D:\deps\magma
set NINJA_HOME=D:\deps\ninja
set SCCACHE_HOME=D:\deps\sccache
rem ===== Make sure build tools are discoverable =====
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%CUDNN_DIR%\bin;%NINJA_HOME%;%SCCACHE_HOME%;%PATH%
rem ===== Include/lib discovery =====
set CMAKE_INCLUDE_PATH=%MKL_ROOT%\include
set LIB=%MKL_ROOT%\lib;%LIB%
set CUDNN_INCLUDE_DIR=%CUDNN_DIR%\include
set CUDNN_LIB_DIR=%CUDNN_DIR%\lib\x64
rem ===== Target legacy GPU architectures here =====
rem Example for Kepler CC 3.5. Add more as needed, e.g., "3.5;5.2"
set TORCH_CUDA_ARCH_LIST=3.5
rem ===== Optional compile cache =====
set USE_SCCACHE=1
set SCCACHE_IDLE_TIMEOUT=0
rem ===== Build type switch =====
if /I "%BUILD_FLAVOR%"=="Debug" (
set DEBUG=1
) else (
set DEBUG=
)
echo Environment configured for %BUILD_FLAVOR% build.
endlocal & set "BUILD_FLAVOR=%BUILD_FLAVOR%" & set "USE_CUDA=%USE_CUDA%" ^
& set "USE_DISTRIBUTED=%USE_DISTRIBUTED%" & set "CMAKE_GENERATOR=%CMAKE_GENERATOR%" ^
& set "CUDA_PATH=%CUDA_PATH%" & set "CUDNN_DIR=%CUDNN_DIR%" ^
& set "MKL_ROOT=%MKL_ROOT%" & set "MAGMA_HOME=%MAGMA_HOME%" ^
& set "NINJA_HOME=%NINJA_HOME%" & set "SCCACHE_HOME=%SCCACHE_HOME%" ^
& set "PATH=%PATH%" & set "CMAKE_INCLUDE_PATH=%CMAKE_INCLUDE_PATH%" ^
& set "LIB=%LIB%" & set "CUDNN_INCLUDE_DIR=%CUDNN_INCLUDE_DIR%" ^
& set "CUDNN_LIB_DIR=%CUDNN_LIB_DIR%" & set "TORCH_CUDA_ARCH_LIST=%TORCH_CUDA_ARCH_LIST%" ^
& set "USE_SCCACHE=%USE_SCCACHE%" & set "SCCACHE_IDLE_TIMEOUT=%SCCACHE_IDLE_TIMEOUT%" ^
& set "DEBUG=%DEBUG%"
Notes:
TORCH_CUDA_ARCH_LISTis the critical lever. Set it to your GPU’s compute capability, e.g.,3.5for many Kepler devices. Multiple values can be separated by smeicolons.- Keep MAGMA’s Release vs Debug binaries aligned with your chosen
BUILD_FLAVOR.
3. Get the PyTorch Source
git clone --recursive https://github.com/pytorch/pytorch.git
cd pytorch
git checkout v1.7.1
git submodule sync --recursive
git submodule update --init --recursive
You can choose a specific 1.7.x tag that matches your needs.
4. Build
Open a "x64 Native Tools Command Prompt for VS 2019", then run your environment script and build:
call D:\path\to\set_env.bat
rem Ensure Python environment is active here if you use venv/conda
python setup.py clean
python setup.py bdist_wheel -v
Alternatively, for an in-place development install:
python setup.py develop -v
If Ninja is on PATH, the build will use it. Otherwise, CMake may fall back to MSBuild.
Common switches and tips
- Disable distributed if you don’t need it:
set USE_DISTRIBUTED=0(already in the script). - For extra NVCC verbosity while troubleshooting:
set NVCC_FLAGS=-Xptxas -v. - If cuDNN is not detected, double-check
CUDNN_INCLUDE_DIRandCUDNN_LIB_DIR. - Ensure your Visual Studio toolset and Windows SDK are installed; missing MSVC components cause link errors.
5. Matching MAGMA and Build Type
- Use MAGMA libraries that match your CUDA version and MSVC ABI.
- For Release builds, link against MAGMA’s Release libraries; for Debug builds, use MAGMA Debug libraries to avoid CRT mismatches.
- Keep
MAGMA_HOMEpointing to the directory that containsincludeandlibfor the chosen variant.
6. Verifying GPU Capbaility and Runtime
After the build and install succeed, confirm that CUDA is usable from Python and that your device capability matches your target.
python - <<"PY"
import torch
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
print('Device count:', torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
name = torch.cuda.get_device_name(i)
cap = torch.cuda.get_device_capability(i)
print(f'[{i}] {name} CC={cap[0]}.{cap[1]}')
x = torch.randn(1024, 1024, device='cuda')
y = torch.mm(x, x)
print('Compute OK, sum=', float(y.sum()))
PY
If torch.cuda.is_available() is True and the capability reported includes 3.5 (or your target), your custom build is using the legacy GPU.
7. Example: Minimal GPU Test Script
# save as test_cuda.py
import torch
def gpu_test():
if not torch.cuda.is_available():
raise SystemExit('CUDA not available')
idx = torch.cuda.current_device()
print('Using:', torch.cuda.get_device_name(idx))
a = torch.rand(2048, 512, device='cuda')
b = torch.rand(512, 256, device='cuda')
c = a @ b
print('Result:', c.shape, 'sum=', c.float().sum().item())
if __name__ == '__main__':
gpu_test()
Run with python test_cuda.py to quickly validate numerical kernels are working on your GPU.
8. Notes on Compatibility
- Prebuilt wheels for PyTorch 1.7 target newer GPUs; building from source with
TORCH_CUDA_ARCH_LISTensures kernels are compiled for your device’s compute capability. - CUDA 10.1 and cuDNN 7.6.x are generally suitable for Kepler-era GPUs. Newer CUDA versions dropped support for some older architectures.
- If you need PyTorch Geometric or other extensions, install them after confirming your custom PyTorch build works. Some third-party packages may also need to be built from source against the same CUDA toolchain.