generation_utils
- class GenerationMixin[源代码]
基类:
object
This class implements the interface for generation task.
It's used as the base class of paddlenlp.transformers.PretrainedModel.
- generate(input_ids=None, attention_mask=None, position_ids=None, max_length=20, min_length=0, decode_strategy='greedy_search', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, num_beam_groups=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, forced_bos_token_id=None, forced_eos_token_id=None, no_repeat_ngram_size=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, use_fast=False, use_fp16_decoding=False, **model_kwargs)[源代码]
The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".
- 参数:
input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value
bos_token_id
.max_length (int, optional) -- The maximum length of the sequence to be generated. Default to 20.
min_length (int, optional) -- The minimum length of the sequence to be generated. Default to 0.
decode_strategy (str, optional) -- The decoding strategy in generation. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search". Default to "greedy_search".
temperature (float, optional) -- The value used to module the next token probabilities in the "sampling" strategy. Default to 1.0, which means no effect.
top_k (int, optional) -- The number of highest probability tokens to keep for top-k-filtering in the "sampling" strategy. Default to 0, which means no effect.
top_p (float, optional) -- The cumulative probability for top-p-filtering in the "sampling" strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.
repetition_penalty (float, optional) -- The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.
num_beams (int, optional) -- The number of beams in the "beam_search" strategy. Default to 1.
num_beam_groups (int, optional) -- Number of groups to divide
num_beams
into in order to use DIVERSE BEAM SEARCH. See this paper for more details. Default to 1.length_penalty (float, optional) -- The exponential penalty to the sequence length in the "beam_search" strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.
early_stopping (bool, optional) -- Whether to stop searching in the "beam_search" strategy when at least
num_beams
sentences are finished per batch or not. Default to False.bos_token_id (int, optional) -- The id of the
bos_token
. Default to None.eos_token_id (int, optional) -- The id of the
eos_token
. Default to None.pad_token_id (int, optional) -- The id of the
pad_token
. Default to None.decoder_start_token_id (int, optional) -- The start token id for encoder-decoder models. Default to None.
forced_bos_token_id (int, optional) -- The id of the token to force as the first generated token. Usually use for multilingual models. Default to None.
forced_eos_token_id (int, optional) -- The id of the token to force as the last generated token. Default to None.
num_return_sequences (int, optional) -- The number of returned sequences for each sequence in the batch. Default to 1.
diversity_rate (float, optional) -- If num_beam_groups is 1, this is the diversity_rate for Diverse Siblings Search. See `this paper https://arxiv.org/abs/1611.08562`__ for more details. If not, this is the diversity_rate for DIVERSE BEAM SEARCH.
use_cache -- (bool, optional): Whether to use the model cache to speed up decoding. Default to True.
use_fast -- (bool, optional): Whether to use fast entry of model for FastGeneration. Default to False.
use_fp16_decoding -- (bool, optional): Whether to use fp16 for decoding. Only works when fast entry is avalible. Default to False.
model_kwargs (dict) -- It can be used to specify additional kwargs passed to the model.
- 返回:
It is a tuple contains two elements: ids and scores. Each element is a Tensor.
With the fields:
- ids (Tensor):
The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input
input_ids
.
- scores (Tensor):
The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.
- 返回类型:
tuple[Tensor]
示例
import paddle from paddlenlp.transformers import ( UnifiedTransformerLMHeadModel, UnifiedTransformerTokenizer ) paddle.seed(2) # Initialize the model and tokenizer model_name_or_path = 'unified_transformer-12L-cn-luge' model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path) tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path) # Prepare the model inputs. history = "早上好,今天空气质量不错。" inputs = tokenizer.dialogue_encode(history, task_type='chitchat', add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="greedy_search") print(ids.shape, scores.shape) # [1, 3] [1, 1] sequence_ids = ids.numpy().tolist()[0] sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) print(response) # 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5) ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="sampling", top_k=5, num_return_sequences=2) print(ids.shape, scores.shape) # [2, 7] [2, 1] response = [] for sequence_ids in ids.numpy().tolist(): sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) response.append(text) print(response) # ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5) ids, scores = model.generate( input_ids=inputs['input_ids'], token_type_ids=inputs['token_type_ids'], position_ids=inputs['position_ids'], attention_mask=inputs['attention_mask'], decode_strategy="beam_search", num_beams=5, num_return_sequences=2) print(ids.shape, scores.shape) # [2, 3] [2, 1] response = [] for sequence_ids in ids.numpy().tolist(): sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)] text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False) response.append(text) print(response) # ['是的', '嗯嗯']