modeling

class T5Model(tie_word_embeddings=True, pad_token_id=0, bos_token_id=0, eos_token_id=1, initializer_factor=1.0, vocab_size=32128, d_model=768, d_kv=64, d_ff=3072, num_layers=12, num_decoder_layers=12, num_heads=12, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, feed_forward_proj='relu')[源代码]

基类:paddlenlp.transformers.t5.modeling.T5PretrainedModel

The bare T5 Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • tie_word_embeddings (bool, optional) -- Whether to tie input and output embeddings. Defaults to False.

  • pad_token_id (int, optional) -- The id of the padding token. Defaults to 0.

  • bos_token_id (int, optional) -- The id of the bos token. Defaults to 0.

  • eos_token_id (int, optional) -- The id of the eos token. Defaults to 1.

  • initializer_factor (float, optional) -- A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing). Defaults to 1.0.

  • vocab_size (int, optional) -- Vocabulary size of inputs_ids in T5Model. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling T5Model. Defaults to 32128.

  • d_model (int, optional) -- Dimensionality of the embedding layer, encoder layer. Defaults to 768.

  • d_kv (int, optional) -- Size of the key, query, value projections per attention head. Defaults to 64.

  • d_ff (int, optional) -- Dimensionality of the feed_forward layer in the residual attention block. Defaults to 3072.

  • num_layers (int, optional) -- Number of hidden layers in the Transformer encoder. Defaults to 12.

  • num_decoder_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to 12.

  • num_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder and decoder. Defaults to 12.

  • relative_attention_num_buckets (int, optional) -- The number of buckets to use for each attention layer. Defaults to 32.

  • dropout_rate (float, optional) -- The dropout ratio for all layers. Defaults to 0.1.

  • layer_norm_eps (float, optional) -- The epsilon used by the layer normalization layers. Defaults to 1e-6.

  • feed_forward_proj (str, optional) -- The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string, "relu", "gated-gelu" are supported. Defaults to "relu".

  • feed_forward_proj -- The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string, "relu", "gated-gelu" are supported. Defaults to "relu".

forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, use_cache=True, output_attentions=False, output_hidden_states=False)[源代码]

The T5Model forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention on to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have 0.0 values and the others have 1.0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be int64 and it has a shape of [batch_size, sequence_length]. Defaults to None, which means no decoder_input_ids is provided, the model will create the tensor by shifting the input_ids to the right.

  • decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in decoder_input_ids. Its data type and shape is the same as attention_mask. Defaults to None.

  • encoder_output (tuple, optional) -- The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size]. hidden_states is hidden_states of all layers in the Transformer encoder. The length of hidden_states is num_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size]. attentions is attentions of all layers of in the Transformer encoder. The length of attentions is num_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].

  • cache (Tuple[Tuple[Tensor]], optional) -- Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model. Can be used to speed up sequential decoding. The input_ids which have their past given to this model should not be passed as input ids as they have already been computed. Defaults to None.

  • use_cache (bool, optional) -- Whether or not to use cache. If set to True, past_buckets_states states are returned and can be used to speed up decoding. Defaults to False.

  • output_attentions (bool, optional) -- Whether or not to return the attentions tensors of all attention layers. Defaults to False.

  • output_hidden_states (bool, optional) -- Whether or not to return the output of all hidden layers. Defaults to False.

返回

Returns tuple (last_hidden_state, cache, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions)

With the fields:

  • last_hidden_state (Tensor):

    Sequence of hidden-states at the last layer of the decoder of the model. It's data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].

  • cache (List[tuple(Tensor, Tensor)], optional):

    returned when use_cache=True is passed. List of tuple(Tensor, Tensor) of length config["num_layers"], with the first element being the previous buckets of shape [batch_size, num_heads, num_hashes, sequence_length] and the second being the previous hidden_states of shape [batch_size, sequence_length, hidden_size].

  • decoder_hidden_states (tuple(Tensor), optional)

    returned when output_hidden_states=True is passed. Tuple of Tensor (one for the output of the embeddings + one for the output of decoder each layer) of shape (batch_size, sequence_length, hidden_size).

  • decoder_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

  • cross_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

  • encoder_last_hidden_state (Tensor):

    Sequence of hidden-states at the last layer of the encoder of the model. It's data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].

  • encoder_hidden_states (tuple(Tensor), optional):

    returned when output_hidden_states=True is passed. tuple of Tensor (one for the output of the embeddings + one for the output of encoder each layer). Each Tensor has a data type of float32 and its shape is [batch_size, sequence_length, hidden_size].

  • encoder_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

返回类型

tuple

示例

import paddle
from paddlenlp.transformers import T5Model, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5Model.from_pretrained('t5-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
input_ids = paddle.to_tensor([inputs["input_ids"]], dtype="int64")
decoder_inputs = tokenizer("It means you can")
decoder_input_ids = paddle.to_tensor([decoder_inputs["input_ids"]], dtype="int64")

outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
last_hidden_state = outputs[0]
print(last_hidden_state.shape)
# [1, 5, 768]
class T5PretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained T5 models. It provides T5 related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

init_weights()[源代码]

Initializes and tie weights if needed.

base_model_class

alias of paddlenlp.transformers.t5.modeling.T5Model

class T5ForConditionalGeneration(t5)[源代码]

基类:paddlenlp.transformers.t5.modeling.T5PretrainedModel

The T5 Model transformer with a language modeling head on top.

参数

t5 (T5Model) -- An instance of T5Model.

forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, labels=None, use_cache=True, output_attentions=False, output_hidden_states=False)[源代码]
参数
  • input_ids (Tensor, optional) -- See T5Model.

  • attention_mask (Tensor, optional) -- See T5Model.

  • decoder_input_ids (Tensor, optional) -- See T5Model.

  • decoder_attention_mask (Tensor, optional) -- See T5Model.

  • encoder_output (tuple(Tensor), optional) -- See T5Model.

  • cache (List[tuple(Tensor, Tensor)], optional) -- See T5Model.

  • labels (Tensor, optional) -- Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-100, 0, ..., vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., vocab_size]. Shape is [batch_size, sequence_length] and dtype is int64.

  • use_cache (bool, optional) -- See T5Model.

  • output_attentions (bool, optional) -- See T5Model.

  • output_hidden_states (bool, optional) -- See T5Model.

返回

Returns tuple (loss, logits, cache, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions)

With the fields:

  • loss (Tensor):

    returned when labels is provided. Language modeling loss. It's data type should be float32 and its shape is [1,].

  • logits (Tensor):

    Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). It's data type should be float32 and its shape is [batch_size, sequence_length, vocab_size].

  • cache (List[tuple(Tensor, Tensor)], optional):

    See T5Model.

  • decoder_hidden_states (tuple(Tensor), optional)

    See T5Model.

  • decoder_attentions (tuple(Tensor), optional):

    See T5Model.

  • cross_attentions (tuple(Tensor), optional):

    See T5Model.

  • encoder_last_hidden_state (Tensor):

    See T5Model.

  • encoder_hidden_states (tuple(Tensor), optional):

    See T5Model.

  • encoder_attentions (tuple(Tensor), optional):

    See T5Model.

返回类型

tuple

示例

import paddle
from paddlenlp.transformers import T5ForConditionalGeneration, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs, labels=inputs["input_ids"])

loss = output[0]
logits = output[1]