faster_transformer

class FasterTransformer(src_vocab_size, trg_vocab_size, max_length, num_encoder_layers, num_decoder_layers, n_head, d_model, d_inner_hid, dropout, weight_sharing, attn_dropout=None, act_dropout=None, bos_id=0, eos_id=1, decoding_strategy='beam_search', beam_size=4, topk=1, topp=0.0, max_out_len=256, diversity_rate=0.0, decoding_lib=None, use_fp16_decoding=False, rel_len=False, alpha=0.6)[源代码]

基类:paddlenlp.transformers.transformer.modeling.TransformerModel

FasterTransformer is a faster version for generation with the Transformer model. It uses a custom op based on and enhancing NV FasterTransformer to do fast generation.

参数
  • src_vocab_size (int) -- The size of source vocabulary.

  • trg_vocab_size (int) -- The size of target vocabulary.

  • max_length (int) -- The maximum length of input sequences.

  • num_encoder_layers (int) -- The number of sub-layers to be stacked in the encoder.

  • num_decoder_layers (int) -- The number of sub-layers to be stacked in the decoder.

  • n_head (int) -- The number of head used in multi-head attention.

  • d_model (int) -- The dimension for word embeddings, which is also the last dimension of the input and output of multi-head attention, position-wise feed-forward networks, encoder and decoder.

  • d_inner_hid (int) -- Size of the hidden layer in position-wise feed-forward networks.

  • dropout (float) -- Dropout rates. Used for pre-process, activation and inside attention.

  • weight_sharing (bool) -- Whether to use weight sharing.

  • attn_dropout (float) -- The dropout probability used in MHA to drop some attention target. If None, use the value of dropout. Defaults to None.

  • act_dropout (float) -- The dropout probability used after FFN activition. If None, use the value of dropout. Defaults to None.

  • bos_id (int, optional) -- The start token id and also is used as padding id. Defaults to 0.

  • eos_id (int, optional) -- The end token id. Defaults to 1.

  • decoding_strategy (str, optional) -- Indicating the strategy of decoding. It can be 'beam_search', 'beam_search_v2', 'topk_sampling' and 'topp_sampling'. For beam search strategies, 'v2' would select the top beam_size * 2 beams and process the top beam_size alive and finish beams in them separately, while 'v1' would only select the top beam_size beams and mix up the alive and finish beams. 'v2' always searchs more and get better results, since the alive beams would always be beam_size while the number of alive beams in v1 might decrease when meeting the end token. However, 'v2' always generates longer results thus might do more calculation and be slower.

  • beam_size (int, optional) -- The beam width for beam search. Defaults to 4.

  • topk (int, optional) -- The number of highest probability tokens to keep for top-k sampling. Defaults to 4.

  • topp (float, optional) -- The most probable tokens whose cumulative probability is not less than topp are kept for top-p sampling. Defaults to 4.

  • max_out_len (int, optional) -- The maximum output length. Defaults to 256.

  • diversity_rate (float, optional) -- Refer to A Simple, Fast Diverse Decoding Algorithm for Neural Generation for details. Bigger diversity_rate would lead to more diversity. if diversity_rate == 0 is equivalent to naive BeamSearch. Default to 0 if not set.

  • use_fp16_decoding (bool, optional) -- Whether to use fp16 for decoding.

  • rel_len (bool, optional) -- Indicating whether max_out_len in is the length relative to that of source text. Only works in v2 temporarily. It is suggest to set a small max_out_len and use rel_len=True. Default to False if not set.

  • alpha (float, optional) -- The power number in length penalty calculation. Only works in v2 temporarily. Refer to GNMT. Default to 0.6 if not set.

forward(src_word, trg_word=None)[源代码]

The Transformer forward methods. The input are source/target sequences, and returns logits.

参数
  • src_word (Tensor) -- The ids of source sequences words. It is a tensor with shape [batch_size, source_sequence_length] and its data type can be int or int64.

  • trg_word (Tensor) -- The ids of target sequences words. It is a tensor with shape [batch_size, target_sequence_length] and its data type can be int or int64.

返回

Output tensor of the final layer of the model whose data type can be float32 or float64 with shape [batch_size, sequence_length, vocab_size].

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import TransformerModel

transformer = TransformerModel(
    src_vocab_size=30000,
    trg_vocab_size=30000,
    max_length=257,
    num_encoder_layers=6,
    num_decoder_layers=6,
    n_head=8,
    d_model=512,
    d_inner_hid=2048,
    dropout=0.1,
    weight_sharing=True,
    bos_id=0,
    eos_id=1)

batch_size = 5
seq_len = 10
predict = transformer(
    src_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]),
    trg_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]))
export_params(init_from_params, place)[源代码]

This method is used for load static graph from dygraph checkpoint or export inference model using static graph.

参数
  • init_from_params (string) -- The path to dygraph checkpoint.

  • place (paddle.Place) -- The place to execute static graph.

示例

class TransformerGenerator(src_vocab_size, trg_vocab_size, max_length, num_encoder_layers, num_decoder_layers, n_head, d_model, d_inner_hid, dropout, weight_sharing, bos_id=0, eos_id=1, beam_size=4, max_out_len=256, **kwargs)[源代码]

基类:paddle.fluid.dygraph.layers.Layer

The Transformer model for auto-regressive generation with beam search. It wraps FasterTransformer and InferTransformerModel, and automatically chioces using FasterTransformer (with jit building) or the slower verison InferTransformerModel.

参数
  • src_vocab_size (int) -- The size of source vocabulary.

  • trg_vocab_size (int) -- The size of target vocabulary.

  • max_length (int) -- The maximum length of input sequences.

  • num_encoder_layers (int) -- The number of sub-layers to be stacked in the encoder.

  • num_decoder_layers (int) -- The number of sub-layers to be stacked in the decoder.

  • n_head (int) -- The number of head used in multi-head attention.

  • d_model (int) -- The dimension for word embeddings, which is also the last dimension of the input and output of multi-head attention, position-wise feed-forward networks, encoder and decoder.

  • d_inner_hid (int) -- Size of the hidden layer in position-wise feed-forward networks.

  • dropout (float) -- Dropout rates. Used for pre-process, activation and inside attention.

  • weight_sharing (bool) -- Whether to use weight sharing.

  • bos_id (int, optional) -- The start token id and also is used as padding id. Defaults to 0.

  • eos_id (int, optional) -- The end token id. Defaults to 1.

  • beam_size (int, optional) -- The beam width for beam search. Defaults to 4.

  • max_out_len (int, optional) -- The maximum output length. Defaults to 256.

  • kwargs --

    The key word arguments can be output_time_major, use_ft, use_fp16_decoding, rel_len, alpha:

    • output_time_major(bool, optional): Indicate the data layout of predicted

    Tensor. If False, the data layout would be batch major with shape [batch_size, seq_len, beam_size]. If True, the data layout would be time major with shape [seq_len, batch_size, beam_size]. Default to False.

    • use_ft(bool, optional): Whether to use FasterTransformer

    for decoding. Default to True if not set.

    • use_fp16_decoding(bool, optional): Whether to use fp16

    for decoding. Only works when using FasterTransformer.

    • beam_search_version(str, optional): Indicating the strategy of

    beam search. It can be 'v1' or 'v2'. 'v2' would select the top beam_size * 2 beams and process the top beam_size alive and finish beams in them separately, while 'v1' would only select the top beam_size beams and mix up the alive and finish beams. 'v2' always searchs more and get better results, since the alive beams would always be beam_size while the number of alive beams in v1 might decrease when meeting the end token. However, 'v2' always generates longer results thus might do more calculation and be slower.

    • rel_len(bool, optional): Indicating whether max_out_len in is

    the length relative to that of source text. Only works in v2 temporarily. It is suggest to set a small max_out_len and use rel_len=True. Default to False if not set.

    • alpha(float, optional): The power number in length penalty

    calculation. Refer to GNMT. Only works in v2 temporarily. Default to 0.6 if not set.

    • diversity_rate(float, optional): Refer to `A Simple, Fast Diverse

    Decoding Algorithm for Neural Generation <https://arxiv.org/abs/1611.08562>`_ for details. Bigger diversity_rate would lead to more diversity. if diversity_rate == 0 is equivalent to naive BeamSearch. Default to 0 if not set. NOTE: Only works when using FasterTransformer temporarily.

forward(src_word, trg_word=None)[源代码]

Performs decoding for transformer model.

参数
  • src_word (Tensor) -- The ids of source sequence words. It is a tensor with shape [batch_size, source_sequence_length] and its data type can be int or int64.

  • trg_word (Tensor) -- The ids of target sequence words. Normally, it should NOT be given. If it's given, force decoding with previous output token will be trigger. Defaults to None.

返回

An int64 tensor shaped indicating the predicted ids. Its shape is [batch_size, seq_len, beam_size] or [seq_len, batch_size, beam_size] according to output_time_major. While, when using FasterTransformer and beam search v2, the beam dimension would be doubled to include both the top beam_size alive and finish beams, thus the tensor shape is [batch_size, seq_len, beam_size * 2] or [seq_len, batch_size, beam_size * 2].

返回类型

Tensor

示例

import paddle
from paddlenlp.ops import TransformerGenerator

transformer = TransformerGenerator(
    src_vocab_size=30000,
    trg_vocab_size=30000,
    max_length=256,
    num_encoder_layers=6,
    num_decoder_layers=6,
    n_head=8,
    d_model=512,
    d_inner_hid=2048,
    dropout=0.1,
    weight_sharing=True,
    bos_id=0,
    eos_id=1,
    beam_size=4,
    max_out_len=256)

batch_size = 5
seq_len = 10
transformer(
    src_word=paddle.randint(low=3, high=30000, shape=[batch_size, seq_len]))
class FasterGPT(model, decoding_lib=None, use_fp16_decoding=False)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

forward(input_ids, mem_seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)[源代码]

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, mem_seq_len=None, attention_mask=None, top_k=4, top_p=0.0, max_length=256, bos_token_id=None, eos_token_id=None, pad_token_id=None, temperature=0, decode_strategy='sample', num_return_sequences=1, **model_kwargs)

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

class FasterUnifiedTransformer(model, decode_strategy='sampling', decoding_lib=None, use_fp16_decoding=False)[源代码]

基类:paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerPretrainedModel

forward(input_ids, token_type_ids, position_ids, attention_mask, seq_len=None, max_length=128, top_k=4, top_p=0.0, bos_token_id=None, eos_token_id=None, pad_token_id=None, num_beams=4, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6)[源代码]

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, token_type_ids, position_ids, attention_mask, seq_len=None, max_length=128, top_k=4, top_p=0.0, bos_token_id=None, eos_token_id=None, pad_token_id=None, num_beams=4, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6)

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

class FasterUNIMOText(model, decode_strategy='sampling', decoding_lib=None, use_fp16_decoding=False)[源代码]

基类:paddlenlp.transformers.unimo.modeling.UNIMOPretrainedModel

forward(input_ids, token_type_ids, position_ids, attention_mask, seq_len=None, max_length=128, top_k=4, top_p=0.0, num_beams=4, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6)[源代码]

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids, token_type_ids, position_ids, attention_mask, seq_len=None, max_length=128, top_k=4, top_p=0.0, num_beams=4, bos_token_id=None, eos_token_id=None, pad_token_id=None, diversity_rate=0.0, temperature=1.0, num_return_sequences=1, length_penalty=0.6)

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

class FasterBART(model, decode_strategy='beam_search_v2', decoding_lib=None, use_fp16_decoding=False)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

forward(input_ids=None, encoder_output=None, seq_len=None, num_beams=4, top_k=1, top_p=0.0, bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, max_length=256, diversity_rate=0.0, length_penalty=0.6, num_return_sequences=1, **model_kwargs)[源代码]

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments

generate(input_ids=None, encoder_output=None, seq_len=None, num_beams=4, top_k=1, top_p=0.0, bos_token_id=None, eos_token_id=None, pad_token_id=None, decoder_start_token_id=None, max_length=256, diversity_rate=0.0, length_penalty=0.6, num_return_sequences=1, **model_kwargs)

Defines the computation performed at every call. Should be overridden by all subclasses.

参数
  • *inputs (tuple) -- unpacked tuple arguments

  • **kwargs (dict) -- unpacked dict arguments