High-Performance Inference Operator Installation#
Clone the code locally:
git clone https://github.com/PaddlePaddle/PaddleNLP.git
export PYTHONPATH=/path/to/PaddleNLP:$PYTHONPATH
PaddleNLP provides high-performance custom operators for Transformer series models to boost inference and decoding performance. Install the custom operator library first:
# Install custom operators for GPU
cd PaddleNLP/csrc && python setup_cuda.py install
# Install custom operators for XPU
cd PaddleNLP/csrc/xpu/src && sh cmake_build.sh
# Install custom operators for DCU
cd PaddleNLP/csrc && python setup_hip.py install
# Install custom operators for SDAA
cd PaddleNLP/csrc/sdaa && python setup_sdaa.py install
Install Triton dependencies:
pip install triton # Recommended version 3.2.0
python -m pip install git+https://github.com/zhoutianzi666/UseTritonInPaddle.git
# Only need to execute this command once. No need to repeat in future sessions
python -c "import use_triton_in_paddle; use_triton_in_paddle.make_triton_compatible_with_paddle()"
Navigate to the running directory to start:
cd PaddleNLP/llm
Large Model Inference Tutorials:
For Optimal Inference Performance: