Skip to main content

Ctrl+K

PaddleNLP documentation

Model Library List

Introduction to Popular Models
Model Library List

Quick Start

Installation
Text Generation
Quick Training
Quick Inference

Paddle LLM Training

Paddle LLM Main Documentation
LLM - Pre-training Documentation
LLM - Fine-tuning Documentation
LLM - DPO Documentation
LLM - RLHF Documentation
Model Merging Documentation

Paddle LLM Inference

Docker Deployment - Quick Start Guide
LLM Inference Tutorial
Practical Optimization
- Best Practices for High-Performance Inference
- Speculative Decoding Tutorial
Static Graph Model List
Inference Quantization Tutorial for Various Models
- LLaMA
- Qwen
- DeepSeek
- Mixtral
Heterogeneous Device Inference
LLM - Quantization Tutorial

PaddleNLP Toolkit

PaddleNLP One-Stop Prediction: Taskflow API
Pre-trained Word Embeddings

PaddleNLP Tutorials

Transformer Pre-trained Model
- ALBERT
- BERT
- BigBird
- Blenderbot
- Blenderbot-Small
- ChineseBert
- ConvBert
- CTRL
- Deberta
- ERNIE-CTM
- ERNIE-M
- Funnel
- GPT
- LayoutLM
- LayoutLMV2
- LayoutXLM
- Luke
- MobileBert
- PPMiniLM
- ProphetNet
- Reformer
- RemBert
- RoBERTa
- RoFormer
- UnifiedTransformer
- UNIMO
- XLNet
Trainer API Training Tutorial
Dialogue Template Tutorial
Multi-Turn Dialogue Fine-Tuning Tutorial
Chinese Sentiment Analysis Tutorial
Model Compression Tutorial
Data Distillation Tutorial
Torch2Paddle Weight Conversion Tutorial

Evaluation Metrics

Evaluation Metrics

Practical Tutorials

AI Studio Notebook

Advanced Guide

Model Compression
Large-Scale Distributed Training

Community Collaboration

How to Contribute Models
How to Contribute Datasets
- Sharing Your Dataset
- Creating DatasetBuilder
How to Contribute Documentation Examples
How to Join Interest Groups

FAQ

PaddleNLP Frequently Asked Questions (Continuously Updated)

API Reference

paddlenlp.data
- collate
- data_collator
- sampler
- tokenizer
- vocab
paddlenlp.datasets
- dataset
paddlenlp.layers
- crf
- sequence
- tcn
paddlenlp.losses
- rdrop
paddlenlp.metrics
- bleu
- chunk
- distinct
- dureader
- glue
- perplexity
- rouge
- sighan
- span
- squad
- utils
paddlenlp.ops
- distributed
  - utils
    
    random
    
    topo
  - parallel
- fast_transformer
  - transformer
    
    decoder
    
    decoding
    
    encoder
    
    fast_transformer
- optimizer
  - adamwdl
  - ema
- einsum
- ext_utils
paddlenlp.seq2vec
- encoder
paddlenlp.taskflow
paddlenlp.trainer
paddlenlp.transformers
- albert
  - modeling
  - tokenizer
- artist
  - modeling
  - tokenizer
- auto
  - modeling
  - tokenizer
- bart
  - modeling
  - tokenizer
- bert
  - modeling
  - tokenizer
- bert_japanese
  - tokenizer
- bigbird
  - modeling
  - tokenizer
- blenderbot
  - modeling
  - tokenizer
- blenderbot_small
  - modeling
  - tokenizer
- chinesebert
  - modeling
  - tokenizer
- codegen
  - modeling
  - tokenizer
- convbert
  - modeling
  - tokenizer
- ctrl
  - modeling
  - tokenizer
- dallebart
  - modeling
  - tokenizer
- deberta
  - modeling
  - tokenizer
- deberta_v2
  - modeling
  - tokenizer
- distilbert
  - modeling
  - tokenizer
- electra
  - modeling
  - tokenizer
- ernie
  - modeling
  - tokenizer
- ernie_ctm
  - modeling
  - tokenizer
- ernie_doc
  - modeling
  - tokenizer
- ernie_gen
  - modeling
- ernie_gram
- ernie_m
  - modeling
  - tokenizer
- fnet
  - modeling
  - tokenizer
- funnel
  - modeling
  - tokenizer
- gau_alpha
  - modeling
  - tokenizer
- gpt
  - modeling
  - tokenizer
- layoutlm
  - modeling
  - tokenizer
- layoutlmv2
  - modeling
  - tokenizer
- layoutxlm
- luke
  - modeling
  - tokenizer
- mbart
  - modeling
  - tokenizer
- megatronbert
  - modeling
  - tokenizer
- mobilebert
  - modeling
  - tokenizer
- mpnet
  - modeling
  - tokenizer
- nezha
  - modeling
  - tokenizer
- opt
  - modeling
- ppminilm
  - modeling
  - tokenizer
- prophetnet
  - modeling
  - tokenizer
- reformer
  - modeling
  - tokenizer
- rembert
  - modeling
  - tokenizer
- roberta
  - modeling
  - tokenizer
- roformer
  - modeling
  - tokenizer
- roformerv2
  - modeling
  - tokenizer
- semantic_search
  - modeling
- skep
  - modeling
  - tokenizer
- squeezebert
  - modeling
  - tokenizer
- t5
  - modeling
  - tokenizer
- tinybert
  - modeling
  - tokenizer
- transformer
  - modeling
- unified_transformer
- unimo
  - modeling
  - tokenizer
- xlm
  - modeling
  - tokenizer
- xlnet
  - modeling
  - tokenizer
- attention_utils
- convert_slow_tokenizer
- distill_utils
- export
- generation_utils
- model_outputs
- model_utils
- optimization
- sentencepiece_model_pb2
- tokenizer_utils
- tokenizer_utils_base
- tokenizer_utils_fast
- utils
paddlenlp.utils
- batch_sampler
- downloader
- env
- file_lock
- import_utils
- log
- profiler
- tools

.rst

Large Model Heterogeneous Device Inference

Large Model Heterogeneous Device Inference#

Running llama2-7b Model on XPU with PaddleNLP
PaddleNLP is a natural language processing and large language model (LLM) development library based on PaddlePaddle. It contains various large models implemented using the Paddle framework, including the llama2-7B model. To help you better utilize PaddleNLP, you need to clone the entire repository.
Clone PaddleNLP
Switch to the specified commit with corresponding dependencies
Install dependencies
Download XPU custom operators
Download XDNN, XRE and XTDK with one click after setting paths
Extract to current directory
Set environment variables
Install custom operators for XPU devices
Running llama2-13b Model on NPU with PaddleNLP
Fine-tuning: For testing convenience, we provide an advertising generation dataset ready to use:
You can prepare your own fine-tuning data following this format.
Run SFT strategy
Execute inference code
Haiguang K100
Suiyuan GCU
Taichu SDAA
X86 CPU

previous

Mixtral

next

Running llama2-7b Model on XPU with PaddleNLP

By PaddleNLP

© Copyright 2024, PaddleNLP.