Large Model Heterogeneous Device Inference#
- Running llama2-7b Model on XPU with PaddleNLP
- PaddleNLP is a natural language processing and large language model (LLM) development library based on PaddlePaddle. It contains various large models implemented using the Paddle framework, including the llama2-7B model. To help you better utilize PaddleNLP, you need to clone the entire repository.
- Clone PaddleNLP
- Switch to the specified commit with corresponding dependencies
- Install dependencies
- Download XPU custom operators
- Download XDNN, XRE and XTDK with one click after setting paths
- Extract to current directory
- Set environment variables
- Install custom operators for XPU devices
- Running llama2-13b Model on NPU with PaddleNLP
- Fine-tuning: For testing convenience, we provide an advertising generation dataset ready to use:
- You can prepare your own fine-tuning data following this format.
- Run SFT strategy
- Execute inference code
- Haiguang K100
- Suiyuan GCU
- Taichu SDAA
- X86 CPU