Appendix

Nvidia GPU architecture and installation mode supported by PaddlePaddle

GPU	Compute Capability	Corresponding GPU hardware model	Please download the following CUDA version of PaddlePaddle installation package
Pascal	sm_60	Quadro GP100, Tesla P100, DGX-1	CUDA10、CUDA11
Pascal	sm_61	GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4	CUDA10、CUDA11
Volta	sm_70	DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100	CUDA10、CUDA11
Turing	sm_75	GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4	CUDA10、CUDA11
Ampere	sm_80	NVIDIA A100, GA100, NVIDIA DGX-A100	CUDA11.8、CUDA12.x（Recommend）
Ampere	sm_86	Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU	CUDA11.8、CUDA12.x (Recommend)
Hopper	sm_90	NVIDIA H100, H800	CUDA12.6 CUDA12.9 (Recommend) CUDA13.0
Blackwell	sm_100	NVIDIA B100, B200, GB200, NVIDIA DGX-B200	CUDA12.9（Recommend） CUDA13.0
Blackwell	sm_120	NVIDIA RTX 5090, RTX 5080, RTX 5070	CUDA12.9（Recommend） CUDA13.0

Compile Dependency Table

Dependency package name	Version	Description	Installation command
CMake	3.18, 3.19(Recommend), 4.0
GCC	8.2 / 12.2
Clang (macOS Only)	9.0 and above	Usually use the clang version of macOS 10.11 and above
Python（64 bit）	3.9 3.10 3.11 3.12 3.13	depends on libpython3.9+.so	please go to Python official website
SWIG	at least 2.0		`apt install swig` or `yum install swig`
wget	any		`apt install wget` or `yum install wget`
openblas	any	optional
pip	at least 20.2.2		`apt install python-pip` or `yum install Python-pip`
numpy	>=1.21.0		`pip install numpy`
httpx			`pip install httpx`
Pillow			`pip install Pillow`
networkx			`pip install networkx`
typing_extensions			`pip install typing_extensions`
safetensors			`pip install safetensors >=0.6.0`
opt_einsum			`pip install opt_einsum==3.3.0`
protobuf	>=3.20.2		`pip install protobuf`
patchELF	any		`apt install patchelf` or read github patchELF official documentation
go	>=1.8	optional
setuptools	Required in Python 3.12 and above
unrar			brew install unrar (For macOS), apt-get install unrar (For Ubuntu)

Compile Option Table

BLAS

PaddlePaddle supports two BLAS libraries, MKL and OpenBlAS. MKL is used by default. If you use MKL and the machine contains the AVX2 instruction set, you will also download the MKL-DNN math library, for details please refer to here.

If you close MKL, OpenBLAS will be used as the BLAS library.

CUDA/cuDNN

PaddlePaddle automatically finds the CUDA and cuDNN libraries installed in the system for compilation and execution at compile time/runtime. Use the parameter -DCUDA_ARCH_NAME=Auto to specify to enable automatic detection of the SM architecture and speed up compilation.

PaddlePaddle can be compiled and run using any version after cuDNN v5.1, but try to keep the same version of cuDNN in the compiling and running processes. We recommend using the latest version of cuDNN.

Configure Compile Options

PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specifying paths at compile time. When cmake compiles, it first searches the system paths ( /usr/liby and /usr/local/lib ) for these libraries, and also reads the relevant path variables for searching. Can be set by using the -D command, for example:

Cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5

Note: The settings introduced here for these compilation options are only valid for the first cmake. If you want to reset it later, it is recommended to clean up the entire build directory ( rm -rf ) and then specify it.

Installation Package List

Option	Description	Default
WITH_GPU	Whether to support GPU	ON
WITH_AVX	whether to compile PaddlePaddle binaries file containing the AVX instruction set	ON
WITH_PYTHON	Whether the PYTHON interpreter is embedded	ON
WITH_TESTING	Whether to turn on unit test	OFF
WITH_MKL	Whether to use the MKL math library, if not,using OpenBLAS	ON
WITH_SYSTEM_BLAS	Whether to use the system's BLAS	OFF
WITH_DISTRIBUTE	Whether to Compile with distributed version	OFF
WITH_BRPC_RDMA	Whether to use BRPC RDMA as RPC protocol	OFF
ON_INFER	Whether to turn on prediction optimization	OFF
CUDA_ARCH_NAME	Compile only for current CUDA schema or not	All:Compile all supported CUDA architectures optional: Auto automatically recognizes the schema compilation of the current environment
TENSORRT_ROOT	Specify TensorRT path	The default value under windows is '/', The default value under windows is '/usr/'

Version Number	Release Description
paddlepaddle==[version code] such as paddlepaddle==3.3.0	Only support the corresponding version of the CPU PaddlePaddle, please refer to Pypi for the specific version.
paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==3.3.0	For specific installation methods and versions, please refer tohere.

You can find and download the corresponding PaddlePaddle-gpu release for your CUDA environment at the following official PaddlePaddle path:

Multi-version whl package list - Release

Release Instruction	cp39-cp39	cp310-cp310	cp311-cp311	cp312-cp312	cp313-cp313
cpu-mkl-avx	paddlepaddle-3.3.0-cp39-cp39-linux_x86_64.whl	paddlepaddle-3.3.0-cp310-cp310-linux_x86_64.whl	paddlepaddle-3.3.0-cp311-cp311-linux_x86_64.whl	paddlepaddle-3.3.0-cp312-cp312-linux_x86_64.whl	paddlepaddle-3.3.0-cp313-cp313-linux_x86_64.whl
cuda11.8-cudnn8.6-mkl-gcc8.2-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda12.6-cudnn9.0-mkl-gcc12.2-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda12.9-cudnn9.9-mkl-gcc12.2-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
cuda13.0-cudnn9.13-mkl-gcc13.1-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-linux_x86_64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-linux_x86_64.whl
macos-cpu-arm	paddlepaddle-3.3.0-cp39-cp39-macosx_11_0_arm64.whl	paddlepaddle-3.3.0-cp310-cp310-macosx_11_0_arm64.whl	paddlepaddle-3.3.0-cp311-cp311-macosx_11_0_arm64.whl	paddlepaddle-3.3.0-cp312-cp312-macosx_11_0_arm64.whl	paddlepaddle-3.3.0-cp313-cp313-macosx_11_0_arm64.whl
win-cpu-mkl-avx	paddlepaddle-3.3.0-cp39-cp39-win_amd64.whl	paddlepaddle-3.3.0-cp310-cp310-win_amd64.whl	paddlepaddle-3.3.0-cp311-cp311-win_amd64.whl	paddlepaddle-3.3.0-cp312-cp312-win_amd64.whl	paddlepaddle-3.3.0-cp313-cp313-win_amd64.whl
win-cuda11.8-cudnn8.6-mkl-vs2019-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl
win-cuda12.6-cudnn9.0-mkl-vs2019-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl
win-cuda12.9-cudnn9.9-mkl-vs2019-avx	paddlepaddle_gpu-3.3.0-cp39-cp39-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp310-cp310-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp311-cp311-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp312-cp312-win_amd64.whl	paddlepaddle_gpu-3.3.0-cp313-cp313-win_amd64.whl

Table instruction

Vertical axis

cpu-mkl: Support CPU training and prediction, use Intel MKL math library

cuda10_cudnn7-mkl: Support GPU training and prediction, use Intel MKL math library

Transverse axis

Generally, it is similar to "cp310-cp310", in which:

310:python tag, refers to python3.10. Similarly, there are "39", "310", "311", "312", "313", etc

mu:refers to unicode version python, if it is m, refers to non Unicode version Python

Installation package naming rules

Each installation package has a unique name. They are named according to the official rules of Python. The form is as follows:

{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl

The build tag can be missing, and other parts cannot be missing

distribution: wheel name

version: Version, for example 0.14.0 (must be in numeric format)

python tag: similar to 'py39', 'py310', 'py311', 'py312', 'py313', used to indicate the corresponding Python version

abi tag: similar to 'cp33m', 'abi3', 'none'

platform tag: similar to 'linux_x86_64', 'any'

Execute the PaddlePaddle training program in Docker

Suppose you have written a PaddlePaddle program in the current directory (such as /home/work): train.py ( refer to PaddlePaddleBook to write), you can start the training with the following command:

          cd /home/work

         

          docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /work/train.py

         

In the above commands, the -it parameter indicates that the container has been run interactively; -v $PWD:/work specifies that the current path (the absolute path where the PWD variable in Linux will expand to the current path) is mounted to the :/work directory inside the container: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle specifies the container to be used; finally /work/train.py is the command executed inside the container, ie. the training program.

Of course, you can also enter into the Docker container and execute or debug your code interactively:

          docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /bin/bash

         

          cd /work

         

          python train.py

         

**Note: In order to reduce the size, vim is not installed in PaddlePaddle Docker image by default. You can edit the code in the container after executing ** apt-get install -y vim (which installs vim for you) in the container.

Perform GPU training using Docker

In order to ensure that the GPU driver works properly in the image, we recommend using nvidia-docker to run the image. Don't forget to install the latest GPU drivers on your physical machine in advance.For specific driver version requirements, please refer to here

          Nvidia-docker run -it -v $PWD:/work ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9 /bin/bash

         

Note: If you don't have nvidia-docker installed, you can try the following to mount the CUDA library and Linux devices into the Docker container:

          export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') \
$(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} \
${DEVICES} -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-gpu