Appendix

Nvidia GPU architecture and installation mode supported by PaddlePaddle

GPU Compute Capability Corresponding GPU hardware model Please download the following CUDA version of PaddlePaddle installation package
Pascal sm_60 Quadro GP100, Tesla P100, DGX-1 CUDA10、CUDA11
Pascal sm_61 GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4 CUDA10、CUDA11
Volta sm_70 DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100 CUDA10、CUDA11
Turing sm_75 GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4 CUDA10、CUDA11
Ampere sm_80 NVIDIA A100, GA100, NVIDIA DGX-A100 CUDA11.8、CUDA12.x(Recommend)
Ampere sm_86 Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU CUDA11.8、CUDA12.x (Recommend)
Hopper sm_90 NVIDIA H100, H800 CUDA12.6 CUDA12.9 (Recommend) CUDA13.0
Blackwell sm_100 NVIDIA B100, B200, GB200, NVIDIA DGX-B200 CUDA12.9(Recommend) CUDA13.0
Blackwell sm_120 NVIDIA RTX 5090, RTX 5080, RTX 5070 CUDA12.9(Recommend) CUDA13.0



Compile Dependency Table

Dependency package name Version Description Installation command
CMake 3.18, 3.19(Recommend), 4.0
GCC 8.2 / 12.2
Clang (macOS Only) 9.0 and above Usually use the clang version of macOS 10.11 and above
Python(64 bit) 3.9 3.10 3.11 3.12 3.13 depends on libpython3.9+.so please go to Python official website
SWIG at least 2.0 apt install swig or yum install swig
wget any apt install wget or yum install wget
openblas any optional
pip at least 20.2.2 apt install python-pip or yum install Python-pip
numpy >=1.21.0 pip install numpy
httpx pip install httpx
Pillow pip install Pillow
networkx pip install networkx
typing_extensions pip install typing_extensions
safetensors pip install safetensors >=0.6.0
opt_einsum pip install opt_einsum==3.3.0
protobuf >=3.20.2 pip install protobuf
patchELF any apt install patchelf or read github patchELF official documentation
go >=1.8 optional
setuptools Required in Python 3.12 and above
unrar brew install unrar (For macOS), apt-get install unrar (For Ubuntu)



Compile Option Table

BLAS

PaddlePaddle supports two BLAS libraries, MKL and OpenBlAS. MKL is used by default. If you use MKL and the machine contains the AVX2 instruction set, you will also download the MKL-DNN math library, for details please refer to here.

If you close MKL, OpenBLAS will be used as the BLAS library.

CUDA/cuDNN

PaddlePaddle automatically finds the CUDA and cuDNN libraries installed in the system for compilation and execution at compile time/runtime. Use the parameter -DCUDA_ARCH_NAME=Auto to specify to enable automatic detection of the SM architecture and speed up compilation.

PaddlePaddle can be compiled and run using any version after cuDNN v5.1, but try to keep the same version of cuDNN in the compiling and running processes. We recommend using the latest version of cuDNN.

Configure Compile Options

PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specifying paths at compile time. When cmake compiles, it first searches the system paths ( /usr/liby and /usr/local/lib ) for these libraries, and also reads the relevant path variables for searching. Can be set by using the -D command, for example:

Cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5

Note: The settings introduced here for these compilation options are only valid for the first cmake. If you want to reset it later, it is recommended to clean up the entire build directory ( rm -rf ) and then specify it.



Installation Package List

Option Description Default
WITH_GPU Whether to support GPU ON
WITH_AVX whether to compile PaddlePaddle binaries file containing the AVX instruction set ON
WITH_PYTHON Whether the PYTHON interpreter is embedded ON
WITH_TESTING Whether to turn on unit test OFF
WITH_MKL Whether to use the MKL math library, if not,using OpenBLAS ON
WITH_SYSTEM_BLAS Whether to use the system's BLAS OFF
WITH_DISTRIBUTE Whether to Compile with distributed version OFF
WITH_BRPC_RDMA Whether to use BRPC RDMA as RPC protocol OFF
ON_INFER Whether to turn on prediction optimization OFF
CUDA_ARCH_NAME Compile only for current CUDA schema or not All:Compile all supported CUDA architectures optional: Auto automatically recognizes the schema compilation of the current environment
TENSORRT_ROOT Specify TensorRT path The default value under windows is '/', The default value under windows is '/usr/'
Version Number Release Description
paddlepaddle==[version code] such as paddlepaddle==3.3.0 Only support the corresponding version of the CPU PaddlePaddle, please refer to Pypi for the specific version.
paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==3.3.0 For specific installation methods and versions, please refer tohere.

You can find and download the corresponding PaddlePaddle-gpu release for your CUDA environment at the following official PaddlePaddle path:



Multi-version whl package list - Release

Release Instruction cp39-cp39 cp310-cp310 cp311-cp311 cp312-cp312 cp313-cp313
cpu-mkl-avx paddlepaddle-3.3.0-cp39-cp39-linux_x86_64.whl paddlepaddle-3.3.0-cp310-cp310-linux_x86_64.whl paddlepaddle-3.3.0-cp311-cp311-linux_x86_64.whl paddlepaddle-3.3.0-cp312-cp312-linux_x86_64.whl paddlepaddle-3.3.0-cp313-cp313-linux_x86_64.whl
cuda11.8-cudnn8.6-mkl-gcc8.2-avx paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda12.6-cudnn9.0-mkl-gcc12.2-avx paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda12.9-cudnn9.9-mkl-gcc12.2-avx paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda13.0-cudnn9.13-mkl-gcc13.1-avx paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
macos-cpu-arm paddlepaddle-3.3.0-cp39-cp39-macosx_11_0_arm64.whl paddlepaddle-3.3.0-cp310-cp310-macosx_11_0_arm64.whl paddlepaddle-3.3.0-cp311-cp311-macosx_11_0_arm64.whl paddlepaddle-3.3.0-cp312-cp312-macosx_11_0_arm64.whl paddlepaddle-3.3.0-cp313-cp313-macosx_11_0_arm64.whl
win-cpu-mkl-avx paddlepaddle-3.3.0-cp39-cp39-win_amd64.whl paddlepaddle-3.3.0-cp310-cp310-win_amd64.whl paddlepaddle-3.3.0-cp311-cp311-win_amd64.whl paddlepaddle-3.3.0-cp312-cp312-win_amd64.whl paddlepaddle-3.3.0-cp313-cp313-win_amd64.whl
win-cuda11.8-cudnn8.6-mkl-vs2019-avx paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl
win-cuda12.6-cudnn9.0-mkl-vs2019-avx paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl
win-cuda12.9-cudnn9.9-mkl-vs2019-avx paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl

Table instruction

  • Vertical axis

cpu-mkl: Support CPU training and prediction, use Intel MKL math library

cuda10_cudnn7-mkl: Support GPU training and prediction, use Intel MKL math library

  • Transverse axis

Generally, it is similar to "cp310-cp310", in which:

310:python tag, refers to python3.10. Similarly, there are "39", "310", "311", "312", "313", etc

mu:refers to unicode version python, if it is m, refers to non Unicode version Python

  • Installation package naming rules

Each installation package has a unique name. They are named according to the official rules of Python. The form is as follows:

{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl

The build tag can be missing, and other parts cannot be missing

distribution: wheel name

version: Version, for example 0.14.0 (must be in numeric format)

python tag: similar to 'py39', 'py310', 'py311', 'py312', 'py313', used to indicate the corresponding Python version

abi tag: similar to 'cp33m', 'abi3', 'none'

platform tag: similar to 'linux_x86_64', 'any'



Execute the PaddlePaddle training program in Docker

Suppose you have written a PaddlePaddle program in the current directory (such as /home/work): train.py ( refer to PaddlePaddleBook to write), you can start the training with the following command:

cd /home/work
docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /work/train.py

In the above commands, the -it parameter indicates that the container has been run interactively; -v $PWD:/work specifies that the current path (the absolute path where the PWD variable in Linux will expand to the current path) is mounted to the :/work directory inside the container: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle specifies the container to be used; finally /work/train.py is the command executed inside the container, ie. the training program.

Of course, you can also enter into the Docker container and execute or debug your code interactively:

docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /bin/bash
cd /work
python train.py

**Note: In order to reduce the size, vim is not installed in PaddlePaddle Docker image by default. You can edit the code in the container after executing ** apt-get install -y vim (which installs vim for you) in the container.

Perform GPU training using Docker

In order to ensure that the GPU driver works properly in the image, we recommend using nvidia-docker to run the image. Don't forget to install the latest GPU drivers on your physical machine in advance.For specific driver version requirements, please refer to here

Nvidia-docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /bin/bash

Note: If you don't have nvidia-docker installed, you can try the following to mount the CUDA library and Linux devices into the Docker container:

export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') \
$(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} \
${DEVICES} -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-gpu