fast_transformer#

class FasterTransformer(src_vocab_size, trg_vocab_size, max_length, num_encoder_layers, num_decoder_layers, n_head, d_model, d_inner_hid, dropout, weight_sharing, attn_dropout=None, act_dropout=None, bos_id=0, eos_id=1, pad_id=None, decoding_strategy='beam_search', beam_size=4, topk=1, topp=0.0, max_out_len=256, diversity_rate=0.0, decoding_lib=None, use_fp16_decoding=False, enable_fast_encoder=False, use_fp16_encoder=False, rel_len=False, alpha=0.6)[源代码]#

基类:TransformerModel

FasterTransformer is a fast version for generation with the Transformer model. It uses a custom op based on and enhancing NV FasterTransformer to do fast generation.

参数:
  • src_vocab_size (int) -- The size of source vocabulary.

  • trg_vocab_size (int) -- The size of target vocabulary.

  • max_length (int) -- The maximum length of input sequences.

  • num_encoder_layers (int) -- The number of sub-layers to be stacked in the encoder.

  • num_decoder_layers (int) -- The number of sub-layers to be stacked in the decoder.

  • n_head (int) -- The number of head used in multi-head attention.

  • d_model (int) -- The dimension for word embeddings, which is also the last dimension of the input and output of multi-head attention, position-wise feed-forward networks, encoder and decoder.

  • d_inner_hid (int) -- Size of the hidden layer in position-wise feed-forward networks.

  • dropout (float) -- Dropout rates. Used for pre-process, activation and inside attention.

  • weight_sharing (bool) -- Whether to use weight sharing.

  • attn_dropout (float) -- The dropout probability used in MHA to drop some attention target. If None, use the value of dropout. Defaults to None.

  • act_dropout (float) -- The dropout probability used after FFN activition. If None, use the value of dropout. Defaults to None.

  • bos_id (int, optional) -- The start token id and also is used as padding id. Defaults to 0.

  • eos_id (int, optional) -- The end token id. Defaults to 1.

  • pad_id (int, optional) -- The pad token id. Defaults to None. If it's None, the bos_id will be used as pad_id.

  • decoding_strategy (str, optional) -- Indicating the strategy of decoding. It can be 'beam_search', 'beam_search_v2', 'topk_sampling' and 'topp_sampling'. For beam search strategies, 'v2' would select the top beam_size * 2 beams and process the top beam_size alive and finish beams in them separately, while 'v1' would only select the top beam_size beams and mix up the alive and finish beams. 'v2' always searchs more and get better results, since the alive beams would always be beam_size while the number of alive beams in v1 might decrease when meeting the end token. However, 'v2' always generates longer results thus might do more calculation and be slower.

  • beam_size (int, optional) -- The beam width for beam search. Defaults to 4.

  • topk (int, optional) -- The number of highest probability tokens to keep for top-k sampling. Defaults to 4.

  • topp (float, optional) -- The most probable tokens whose cumulative probability is not less than topp are kept for top-p sampling. Defaults to 4.

  • max_out_len (int, optional) -- The maximum output length. Defaults to 256.

  • diversity_rate (float, optional) -- Refer to A Simple, Fast Diverse Decoding Algorithm for Neural Generation for details. Bigger diversity_rate would lead to more diversity. if diversity_rate == 0 is equivalent to naive BeamSearch. Default to 0 if not set.

  • use_fp16_decoding (bool, optional) -- Whether to use fp16 for decoding.

  • enable_fast_encoder (bool, optional) -- Whether to use the fast version of encoder. This is experimental option for now. Defaults to False.

  • use_fp16_encoder (bool, optional) -- Whether to use fp16 for encoder. Only works when enable_fast_encoder is True. Defaults to False.

  • rel_len (bool, optional) -- Indicating whether max_out_len in is the length relative to that of source text. Only works in v2 temporarily. It is suggest to set a small max_out_len and use rel_len=True. Default to False if not set.

  • alpha (float, optional) -- The power number in length penalty calculation. Only works in v2 temporarily. Refer to GNMT. Default to 0.6 if not set.

forward(src_word, trg_word=None)[源代码]#

The Transformer forward methods. The input are source/target sequences, and returns logits.

参数:
  • src_word (Tensor) -- The ids of source sequences words. It is a tensor with shape [batch_size, source_sequence_length] and its data type can be int or int64.

  • trg_word (Tensor) -- The ids of target sequences words. It is a tensor with shape [batch_size, target_sequence_length] and its data type can be int or int64.

返回:

Output tensor of the final layer of the model whose data type can be float32 or float64 with shape [batch_size, sequence_length, vocab_size].

返回类型:

Tensor

示例

import paddle
from paddlenlp.transformers import TransformerModel

transformer = TransformerModel(
    src_vocab_size=30000,
    trg_vocab_size=30000,
    max_length=257,
    num_encoder_layers=6,
    num_decoder_layers=6,
    n_head=8,
    d_model=512,
    d_inner_hid=2048,
    dropout=0.1,
    weight_sharing=True,
    bos_id=0,
    eos_id=1)

batch_size = 5
seq_len = 10
predict = transformer(
    src_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]),
    trg_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]))
export_params(init_from_params, place)[源代码]#

This method is used for load static graph from dygraph checkpoint or export inference model using static graph. Do NOT support faster encoder.

参数:
  • init_from_params (string) -- The path to dygraph checkpoint.

  • place (paddle.Place) -- The place to execute static graph.

示例

class TransformerGenerator(src_vocab_size, trg_vocab_size, max_length, num_encoder_layers, num_decoder_layers, n_head, d_model, d_inner_hid, dropout, weight_sharing, bos_id=0, eos_id=1, pad_id=None, beam_size=4, max_out_len=256, activation='relu', normalize_before=True, **kwargs)[源代码]#

基类:Layer

The Transformer model for auto-regressive generation with beam search. It wraps FasterTransformer and InferTransformerModel, and automatically chioces using FasterTransformer (with jit building) or the slower verison InferTransformerModel.

参数:
  • src_vocab_size (int) -- The size of source vocabulary.

  • trg_vocab_size (int) -- The size of target vocabulary.

  • max_length (int) -- The maximum length of input sequences.

  • num_encoder_layers (int) -- The number of sub-layers to be stacked in the encoder.

  • num_decoder_layers (int) -- The number of sub-layers to be stacked in the decoder.

  • n_head (int) -- The number of head used in multi-head attention.

  • d_model (int) -- The dimension for word embeddings, which is also the last dimension of the input and output of multi-head attention, position-wise feed-forward networks, encoder and decoder.

  • d_inner_hid (int) -- Size of the hidden layer in position-wise feed-forward networks.

  • dropout (float) -- Dropout rates. Used for pre-process, activation and inside attention.

  • weight_sharing (bool) -- Whether to use weight sharing.

  • bos_id (int, optional) -- The start token id and also is used as padding id. Defaults to 0.

  • eos_id (int, optional) -- The end token id. Defaults to 1.

  • beam_size (int, optional) -- The beam width for beam search. Defaults to 4.

  • max_out_len (int, optional) -- The maximum output length. Defaults to 256.

  • activation (str, optional) -- The activation used in FFN. Defaults to "relu".

  • normalize_before (bool, optional) -- Whether to apply pre-normalization. Defaults to True.

  • kwargs --

    The key word arguments can be output_time_major, use_ft, use_fp16_decoding, rel_len, alpha:

    • output_time_major(bool, optional): Indicate the data layout of predicted

    Tensor. If False, the data layout would be batch major with shape [batch_size, seq_len, beam_size]. If True, the data layout would be time major with shape [seq_len, batch_size, beam_size]. Default to False.

    • use_ft(bool, optional): Whether to use FastGeneration

    for decoding. Default to True if not set.

    • use_fp16_decoding(bool, optional): Whether to use fp16

    for decoding. Only works when using FastGeneration.

    • beam_search_version(str, optional): Indicating the strategy of

    beam search. It can be 'v1' or 'v2'. 'v2' would select the top beam_size * 2 beams and process the top beam_size alive and finish beams in them separately, while 'v1' would only select the top beam_size beams and mix up the alive and finish beams. 'v2' always searchs more and get better results, since the alive beams would always be beam_size while the number of alive beams in v1 might decrease when meeting the end token. However, 'v2' always generates longer results thus might do more calculation and be slower.

    • rel_len(bool, optional): Indicating whether max_out_len in is

    the length relative to that of source text. Only works in v2 temporarily. It is suggest to set a small max_out_len and use rel_len=True. Default to False if not set.

    • alpha(float, optional): The power number in length penalty

    calculation. Refer to GNMT. Only works in v2 temporarily. Default to 0.6 if not set.

    • diversity_rate(float, optional): Refer to `A Simple, Fast Diverse

    Decoding Algorithm for Neural Generation <https://arxiv.org/abs/1611.08562>`_ for details. Bigger diversity_rate would lead to more diversity. if diversity_rate == 0 is equivalent to naive BeamSearch. Default to 0 if not set. NOTE: Only works when using FastGeneration temporarily.

forward(src_word, trg_word=None)[源代码]#

Performs decoding for transformer model.

参数:
  • src_word (Tensor) -- The ids of source sequence words. It is a tensor with shape [batch_size, source_sequence_length] and its data type can be int or int64.

  • trg_word (Tensor) -- The ids of target sequence words. Normally, it should NOT be given. If it's given, force decoding with previous output token will be trigger. Defaults to None.

返回:

An int64 tensor shaped indicating the predicted ids. Its shape is [batch_size, seq_len, beam_size] or [seq_len, batch_size, beam_size] according to output_time_major. While, when using FastGeneration and beam search v2, the beam dimension would be doubled to include both the top beam_size alive and finish beams, thus the tensor shape is [batch_size, seq_len, beam_size * 2] or [seq_len, batch_size, beam_size * 2].

返回类型:

Tensor

示例

import paddle
from paddlenlp.ops import TransformerGenerator

transformer = TransformerGenerator(
    src_vocab_size=30000,
    trg_vocab_size=30000,
    max_length=256,
    num_encoder_layers=6,
    num_decoder_layers=6,
    n_head=8,
    d_model=512,
    d_inner_hid=2048,
    dropout=0.1,
    weight_sharing=True,
    bos_id=0,
    eos_id=1,
    beam_size=4,
    max_out_len=256)

batch_size = 5
seq_len = 10
transformer(
    src_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]))
class FasterOPT(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:OPTPretrainedModel

forward(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterGPT(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:GPTPretrainedModel

forward(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterUnifiedTransformer(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:UnifiedTransformerPretrainedModel

forward(input_ids, token_type_ids, attention_mask, seq_len=None, role_ids=None, position_ids=None, max_length=128, min_length=0, top_k=4, top_p=0.0, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, num_beams=4, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, token_type_ids, attention_mask, seq_len=None, role_ids=None, position_ids=None, max_length=128, min_length=0, top_k=4, top_p=0.0, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, num_beams=4, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterUNIMOText(model, decoding_lib=None, use_fp16_decoding=False, **kwargs)[源代码]#

基类:UNIMOPretrainedModel

forward(input_ids, token_type_ids, attention_mask, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, position_ids=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, token_type_ids, attention_mask, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, position_ids=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterMIRO(model, decoding_lib=None, use_fp16_decoding=False, **kwargs)[源代码]#

基类:UNIMOPretrainedModel

forward(input_ids, token_type_ids, attention_mask, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, position_ids=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, token_type_ids, attention_mask, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, position_ids=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterBART(model, decoding_lib=None, use_fp16_decoding=False, enable_fast_encoder=True)[源代码]#

基类:BartPretrainedModel

enable_faster_encoder_func(use_fp16=False, encoder_lib=None)#

Compiles fusion encoder operator intergrated FastGeneration using the method of JIT(Just-In-Time) and replaces the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self to support inference using FastGeneration.

示例

from paddlenlp.ops import enable_fast_encoder, disable_fast_encoder

model.eval()
model = enable_fast_encoder(model)
enc_out = model(src, src_mask)
model = disable_fast_encoder(model)
forward(input_ids=None, encoder_output=None, seq_len=None, num_beams=4, top_k=1, top_p=0.0, temperature=1.0, decode_strategy='beam_search', bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, min_length=0, max_length=20, diversity_rate=0.0, length_penalty=0.6, num_return_sequences=1, early_stopping=False, forced_eos_token_id=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids=None, encoder_output=None, seq_len=None, num_beams=4, top_k=1, top_p=0.0, temperature=1.0, decode_strategy='beam_search', bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, min_length=0, max_length=20, diversity_rate=0.0, length_penalty=0.6, num_return_sequences=1, early_stopping=False, forced_eos_token_id=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterMBART(model, decoding_lib=None, use_fp16_decoding=False, enable_fast_encoder=False)[源代码]#

基类:MBartPretrainedModel

enable_faster_encoder_func(use_fp16=False, encoder_lib=None)#

Compiles fusion encoder operator intergrated FastGeneration using the method of JIT(Just-In-Time) and replaces the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self to support inference using FastGeneration.

示例

from paddlenlp.ops import enable_fast_encoder, disable_fast_encoder

model.eval()
model = enable_fast_encoder(model)
enc_out = model(src, src_mask)
model = disable_fast_encoder(model)
forward(input_ids=None, encoder_output=None, seq_len=None, forced_bos_token_id=None, num_beams=4, top_k=1, top_p=0.0, decode_strategy='beam_search_v3', bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, max_length=256, diversity_rate=0.0, length_penalty=0.6, temperature=1.0, num_return_sequences=1, early_stopping=False, forced_eos_token_id=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids=None, encoder_output=None, seq_len=None, forced_bos_token_id=None, num_beams=4, top_k=1, top_p=0.0, decode_strategy='beam_search_v3', bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, max_length=256, diversity_rate=0.0, length_penalty=0.6, temperature=1.0, num_return_sequences=1, early_stopping=False, forced_eos_token_id=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterGPTJ(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:GPTJPretrainedModel

forward(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, min_length=0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, repetition_penalty=1.0, decode_strategy='sampling', num_return_sequences=1, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, min_length=0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, repetition_penalty=1.0, decode_strategy='sampling', num_return_sequences=1, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterCodeGen(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:CodeGenPreTrainedModel

forward(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, min_length=0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, repetition_penalty=1.0, decode_strategy='sampling', num_return_sequences=1, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, seq_len=None, attention_mask=None, top_k=4, top_p=0.0, min_length=0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, forced_eos_token_id=None, temperature=0, repetition_penalty=1.0, decode_strategy='sampling', num_return_sequences=1, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterPegasus(model, decoding_lib=None, use_fp16_decoding=False, enable_fast_encoder=False, **kwargs)[源代码]#

基类:PegasusPretrainedModel

enable_faster_encoder_func(use_fp16=False, encoder_lib=None)#

Compiles fusion encoder operator intergrated FastGeneration using the method of JIT(Just-In-Time) and replaces the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self to support inference using FastGeneration.

示例

from paddlenlp.ops import enable_fast_encoder, disable_fast_encoder

model.eval()
model = enable_fast_encoder(model)
enc_out = model(src, src_mask)
model = disable_fast_encoder(model)
forward(input_ids=None, encoder_output=None, seq_len=None, min_length=0, max_length=256, num_beams=4, decode_strategy='beam_search_v3', decoder_start_token_id=None, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, length_penalty=0.6, top_k=1, top_p=0.0, temperature=1.0, num_return_sequences=1, early_stopping=False, forced_bos_token_id=None, forced_eos_token_id=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids=None, encoder_output=None, seq_len=None, min_length=0, max_length=256, num_beams=4, decode_strategy='beam_search_v3', decoder_start_token_id=None, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, length_penalty=0.6, top_k=1, top_p=0.0, temperature=1.0, num_return_sequences=1, early_stopping=False, forced_bos_token_id=None, forced_eos_token_id=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']
class FasterT5(model, decoding_lib=None, use_fp16_decoding=False)[源代码]#

基类:T5PretrainedModel

forward(input_ids=None, encoder_output=None, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', decoder_start_token_id=None, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, **model_kwargs)[源代码]#

Defines the computation performed at every call. Should be overridden by all subclasses.

参数:
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids=None, encoder_output=None, seq_len=None, max_length=128, min_length=0, top_k=4, top_p=0.0, num_beams=4, decode_strategy='sampling', decoder_start_token_id=None, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6, early_stopping=False, forced_eos_token_id=None, **model_kwargs)#

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

参数:
  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • generation_config (GenerationConfig, optional) -- The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them. If generation_config is not provided, the default will be used, which had the following loading priority: 1) from the generation_config.json model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

  • stopping_criteria (StoppingCriteriaList, optional) -- Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.

  • streamer (BaseStreamer, optional) -- Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

  • synced_gpus (bool, optional) -- Whether to continue running the while loop until max_length. Unless overridden this flag will be set to True under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished generating before other GPUs. Otherwise it'll be set to False.

  • kwargs (dict) -- It can be used to specify additional kwargs passed to the model.

返回:

It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.

返回类型:

tuple[Tensor]

示例

import paddle
from paddlenlp.transformers import (
    UnifiedTransformerLMHeadModel,
    UnifiedTransformerTokenizer
)

paddle.seed(2)

# Initialize the model and tokenizer
model_name_or_path = 'unified_transformer-12L-cn-luge'
model = UnifiedTransformerLMHeadModel.from_pretrained(model_name_or_path)
tokenizer = UnifiedTransformerTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
history = "早上好,今天空气质量不错。"
inputs = tokenizer.dialogue_encode(history, task_type='chitchat',
    add_start_token_as_response=True, return_tensors=True)
# Generate the sequence by using "greedy_search" strategy
ids, scores = model.generate(
    **inputs,
    decode_strategy="greedy_search")
print(ids.shape, scores.shape)
# [1, 3] [1, 1]
sequence_ids = ids.cpu().numpy().tolist()[0]
sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
response = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
print(response)
# 是的
# Generate 2 sequences by using "sampling" strategy (top_k=5)
generation_config = GenerationConfig(
    decode_strategy="sampling",
    top_k=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 7] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['天气好,心情也好', '你也是']
# Generate 2 sequences by using "beam_search" strategy (num_beams=5)
generation_config = GenerationConfig(
    decode_strategy="beam_search",
    num_beams=5,
    num_return_sequences=2
)
ids, scores = model.generate(
    **inputs,
    generation_config=generation_config,
    )
print(ids.shape, scores.shape)
# [2, 3] [2, 1]
response = []
for sequence_ids in ids.cpu().numpy().tolist():
    sequence_ids = sequence_ids[:sequence_ids.index(tokenizer.sep_token_id)]
    text = tokenizer.convert_ids_to_string(sequence_ids, keep_space=False)
    response.append(text)
print(response)
# ['是的', '嗯嗯']