generation_utils

class GenerationMixin[源代码]

基类:object

This class implements the interface for generation task.

It's used as the base class of paddlenlp.transformers.PretrainedModel.

generate(input_ids=None, max_length=20, min_length=0, decode_strategy='greedy_search', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, **model_kwargs)[源代码]

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • max_length (int, optional) -- The maximum length of the sequence to be generated. Default to 20.

  • min_length (int, optional) -- The minimum length of the sequence to be generated. Default to 0.

  • decode_strategy (str, optional) -- The decoding strategy in generation. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search". Default to "greedy_search".

  • temperature (float, optional) -- The value used to module the next token probabilities in the "sampling" strategy. Default to 1.0, which means no effect.

  • top_k (int, optional) -- The number of highest probability tokens to keep for top-k-filtering in the "sampling" strategy. Default to 0, which means no effect.

  • top_p (float, optional) -- The cumulative probability for top-p-filtering in the "sampling" strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.

  • repetition_penalty (float, optional) -- The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.

  • num_beams (int, optional) -- The number of beams in the "beam_search" strategy. Default to 1.

  • length_penalty (float, optional) -- The exponential penalty to the sequence length in the "beam_search" strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.

  • early_stopping (bool, optional) -- Whether to stop searching in the "beam_search" strategy when at least num_beams sentences are finished per batch or not. Default to False.

  • bos_token_id (int, optional) -- The id of the bos_token. Default to None.

  • eos_token_id (int, optional) -- The id of the eos_token. Default to None.

  • pad_token_id (int, optional) -- The id of the pad_token. Default to None.

  • num_return_sequences (int, optional) -- The number of returned sequences for each sequence in the batch. Default to 1.

  • diversity_rate (float, optional) -- The diversity_rate for diverse siblings search. See this paper for more details. https://arxiv.org/abs/1611.08562.

  • use_cache -- (bool, optional): Whether or not use the model cache to speed up decoding. Default to True.

  • model_kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    input_ids=inputs['input_ids'],
    token_type_ids=inputs['token_type_ids'],
    position_ids=inputs['position_ids'],
    attention_mask=inputs['attention_mask'],
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
ids, scores = model.generate(
    input_ids=inputs['input_ids'],
    token_type_ids=inputs['token_type_ids'],
    position_ids=inputs['position_ids'],
    attention_mask=inputs['attention_mask'],
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2)
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
ids, scores = model.generate(
    input_ids=inputs['input_ids'],
    token_type_ids=inputs['token_type_ids'],
    position_ids=inputs['position_ids'],
    attention_mask=inputs['attention_mask'],
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2)
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']