# generation_utils¶

class GenerationMixin[source]

Bases: object

This class implements the interface for generation task.

It’s used as the base class of paddlenlp.transformers.PretrainedModel.

generate(input_ids=None, max_length=20, min_length=0, decode_strategy='greedy_search', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, num_beam_groups=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, forced_bos_token_id=None, forced_eos_token_id=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, use_faster=False, use_fp16_decoding=False, **model_kwargs)[source]

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”.

Parameters
• input_ids (Tensor, optional) – The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

• max_length (int, optional) – The maximum length of the sequence to be generated. Default to 20.

• min_length (int, optional) – The minimum length of the sequence to be generated. Default to 0.

• decode_strategy (str, optional) – The decoding strategy in generation. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”. Default to “greedy_search”.

• temperature (float, optional) – The value used to module the next token probabilities in the “sampling” strategy. Default to 1.0, which means no effect.

• top_k (int, optional) – The number of highest probability tokens to keep for top-k-filtering in the “sampling” strategy. Default to 0, which means no effect.

• top_p (float, optional) – The cumulative probability for top-p-filtering in the “sampling” strategy. The value should satisfy $$0 <= top\_p < 1$$. Default to 1.0, which means no effect.

• repetition_penalty (float, optional) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.

• num_beams (int, optional) – The number of beams in the “beam_search” strategy. Default to 1.

• num_beam_groups (int, optional) – Number of groups to divide num_beams into in order to use DIVERSE BEAM SEARCH. See this paper for more details. Default to 1.

• length_penalty (float, optional) – The exponential penalty to the sequence length in the “beam_search” strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.

• early_stopping (bool, optional) – Whether to stop searching in the “beam_search” strategy when at least num_beams sentences are finished per batch or not. Default to False.

• bos_token_id (int, optional) – The id of the bos_token. Default to None.

• eos_token_id (int, optional) – The id of the eos_token. Default to None.

• pad_token_id (int, optional) – The id of the pad_token. Default to None.

• decoder_start_token_id (int, optional) – The start token id for encoder-decoder models. Default to None.

• forced_bos_token_id (int, optional) – The id of the token to force as the first generated token. Usually use for multilingual models. Default to None.

• forced_eos_token_id (int, optional) – The id of the token to force as the last generated token. Default to None.

• num_return_sequences (int, optional) – The number of returned sequences for each sequence in the batch. Default to 1.

• diversity_rate (float, optional) – If num_beam_groups is 1, this is the diversity_rate for Diverse Siblings Search. See this paper https://arxiv.org/abs/1611.08562__ for more details. If not, this is the diversity_rate for DIVERSE BEAM SEARCH.

• use_cache – (bool, optional): Whether to use the model cache to speed up decoding. Default to True.

• use_faster – (bool, optional): Whether to use faster entry of model for FasterGeneration. Default to False.

• use_fp16_decoding – (bool, optional): Whether to use fp16 for decoding. Only works when faster entry is avalible. Default to False.

• model_kwargs (dict) – It can be used to specify additional kwargs passed to the model.

Returns

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

• ids (Tensor):

The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

• scores (Tensor):

The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

Return type

tuple[Tensor]

Example

import paddle
from paddlenlp.transformers import (
UnifiedTransformerTokenizer
)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好，今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',

# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
input_ids=inputs['input_ids'],
token_type_ids=inputs['token_type_ids'],
position_ids=inputs['position_ids'],
decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的

# Generate 2 sequences by using "sampling" strategy (top_k=5)
ids, scores = model.generate(
input_ids=inputs['input_ids'],
token_type_ids=inputs['token_type_ids'],
position_ids=inputs['position_ids'],
decode_strategy="sampling",
top_k=5,
num_return_sequences=2)
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.numpy().tolist():
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
response.append(text)
print(response)
# ['天气好,心情也好', '你也是']

# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
ids, scores = model.generate(
input_ids=inputs['input_ids'],
token_type_ids=inputs['token_type_ids'],
position_ids=inputs['position_ids'],