modeling

class T5Model(tie_word_embeddings=True, pad_token_id=0, bos_token_id=0, eos_token_id=1, initializer_factor=1.0, vocab_size=32128, d_model=768, d_kv=64, d_ff=3072, num_layers=12, num_decoder_layers=12, num_heads=12, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, feed_forward_proj='relu', enable_recompute=False)[source]

Bases: paddlenlp.transformers.t5.modeling.T5PretrainedModel

The bare T5 Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

Parameters
  • tie_word_embeddings (bool, optional) – Whether to tie input and output embeddings. Defaults to False.

  • pad_token_id (int, optional) – The id of the padding token. Defaults to 0.

  • bos_token_id (int, optional) – The id of the bos token. Defaults to 0.

  • eos_token_id (int, optional) – The id of the eos token. Defaults to 1.

  • initializer_factor (float, optional) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing). Defaults to 1.0.

  • vocab_size (int, optional) – Vocabulary size of inputs_ids in T5Model. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling T5Model. Defaults to 32128.

  • d_model (int, optional) – Dimensionality of the embedding layer, encoder layer. Defaults to 768.

  • d_kv (int, optional) – Size of the key, query, value projections per attention head. Defaults to 64.

  • d_ff (int, optional) – Dimensionality of the feed_forward layer in the residual attention block. Defaults to 3072.

  • num_layers (int, optional) – Number of hidden layers in the Transformer encoder. Defaults to 12.

  • num_decoder_layers (int, optional) – Number of hidden layers in the Transformer decoder. Defaults to 12.

  • num_heads (int, optional) – Number of attention heads for each attention layer in the Transformer encoder and decoder. Defaults to 12.

  • relative_attention_num_buckets (int, optional) – The number of buckets to use for each attention layer. Defaults to 32.

  • dropout_rate (float, optional) – The dropout ratio for all layers. Defaults to 0.1.

  • layer_norm_eps (float, optional) – The epsilon used by the layer normalization layers. Defaults to 1e-6.

  • feed_forward_proj (str, optional) – The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string, "relu", "gated-gelu" are supported. Defaults to "relu".

  • feed_forward_proj – The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string, "relu", "gated-gelu" are supported. Defaults to "relu".

get_input_embeddings()[source]

get input embedding of model

Returns

embedding of model

Return type

nn.Embedding

set_input_embeddings(new_embeddings)[source]

set new input embedding for model

Parameters

value (Embedding) – the new embedding of model

Raises

NotImplementedError – Model has not implement set_input_embeddings method

forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=True, output_attentions=False, output_hidden_states=False, return_dict=False)[source]

The T5Model forward method, overrides the __call__() special method.

Parameters
  • input_ids (Tensor) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention on to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have 0.0 values and the others have 1.0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • decoder_input_ids (Tensor, optional) – Indices of decoder input sequence tokens in the vocabulary. Its data type should be int64 and it has a shape of [batch_size, sequence_length]. Defaults to None, which means no decoder_input_ids is provided, the model will create the tensor by shifting the input_ids to the right.

  • decoder_attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions in decoder_input_ids. Its data type and shape is the same as attention_mask. Defaults to None.

  • encoder_output (tuple, optional) – The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size]. hidden_states is hidden_states of all layers in the Transformer encoder. The length of hidden_states is num_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size]. attentions is attentions of all layers of in the Transformer encoder. The length of attentions is num_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].

  • cache (Tuple[Tuple[Tensor]], optional) – Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model. Can be used to speed up sequential decoding. The input_ids which have their past given to this model should not be passed as input ids as they have already been computed. Defaults to None.

  • inputs_embeds (Tensor, optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation of shape (batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.

  • decoder_inputs_embeds (Tensor, optional) –

    Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation of shape (batch_size, target_sequence_length, hidden_size). If cache is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.

    If decoder_input_ids and decoder_inputs_embeds are both unset, decoder_inputs_embeds takes the value of inputs_embeds.

  • use_cache (bool, optional) – Whether or not to use cache. If set to True, past_buckets_states states are returned and can be used to speed up decoding. Defaults to False.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. Defaults to False.

  • output_hidden_states (bool, optional) – Whether or not to return the output of all hidden layers. Defaults to False.

  • return_dict (bool, optional) – Whether or not to return a class:Seq2SeqModelOutput. If False, the output will be a tuple of tensors. Defaults to False.

Returns

An instance of Seq2SeqModelOutput if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of Seq2SeqModelOutput.

tuple: Returns tuple (last_hidden_state, cache, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions)

With the fields:

  • last_hidden_state (Tensor):

    Sequence of hidden-states at the last layer of the decoder of the model. It’s data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].

  • cache (List[tuple(Tensor, Tensor)], optional):

    returned when use_cache=True is passed. List of tuple(Tensor, Tensor) of length config["num_layers"], with the first element being the previous buckets of shape [batch_size, num_heads, num_hashes, sequence_length] and the second being the previous hidden_states of shape [batch_size, sequence_length, hidden_size].

  • decoder_hidden_states (tuple(Tensor), optional)

    returned when output_hidden_states=True is passed. Tuple of Tensor (one for the output of the embeddings + one for the output of decoder each layer) of shape (batch_size, sequence_length, hidden_size).

  • decoder_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

  • cross_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

  • encoder_last_hidden_state (Tensor):

    Sequence of hidden-states at the last layer of the encoder of the model. It’s data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].

  • encoder_hidden_states (tuple(Tensor), optional):

    returned when output_hidden_states=True is passed. tuple of Tensor (one for the output of the embeddings + one for the output of encoder each layer). Each Tensor has a data type of float32 and its shape is [batch_size, sequence_length, hidden_size].

  • encoder_attentions (tuple(Tensor), optional):

    returned when output_attentions=True is passed. tuple of Tensor (one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].

Example

import paddle
from paddlenlp.transformers import T5Model, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5Model.from_pretrained('t5-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
input_ids = paddle.to_tensor([inputs["input_ids"]], dtype="int64")
decoder_inputs = tokenizer("It means you can")
decoder_input_ids = paddle.to_tensor([decoder_inputs["input_ids"]], dtype="int64")

outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
last_hidden_state = outputs[0]
print(last_hidden_state.shape)
# [1, 5, 768]
class T5PretrainedModel(*args, **kwargs)[source]

Bases: paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained T5 models. It provides T5 related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

init_weights()[source]

Initializes and tie weights if needed.

base_model_class

alias of paddlenlp.transformers.t5.modeling.T5Model

class T5ForConditionalGeneration(t5)[source]

Bases: paddlenlp.transformers.t5.modeling.T5PretrainedModel

The T5 Model transformer with a language modeling head on top.

Parameters

t5 (T5Model) – An instance of T5Model.

get_input_embeddings()[source]

get input embedding of model

Returns

embedding of model

Return type

nn.Embedding

set_input_embeddings(new_embeddings)[source]

set new input embedding for model

Parameters

value (Embedding) – the new embedding of model

Raises

NotImplementedError – Model has not implement set_input_embeddings method

get_output_embeddings()[source]

To be overwrited for models with output embeddings

Returns

the otuput embedding of model

Return type

Optional[Embedding]

forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, labels=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]
Parameters
  • input_ids (Tensor, optional) – See T5Model.

  • attention_mask (Tensor, optional) – See T5Model.

  • decoder_input_ids (Tensor, optional) – See T5Model.

  • decoder_attention_mask (Tensor, optional) – See T5Model.

  • encoder_output (tuple(Tensor), optional) – See T5Model.

  • cache (List[tuple(Tensor, Tensor)], optional) – See T5Model.

  • labels (Tensor, optional) – Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-100, 0, ..., vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., vocab_size]. Shape is [batch_size, sequence_length] and dtype is int64.

  • inputs_embeds (Tensor, optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation of shape (batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.

  • decoder_inputs_embeds (Tensor , optional) –

    Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation of shape (batch_size, target_sequence_length, hidden_size). If past_key_values is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.

    If decoder_input_ids and decoder_inputs_embeds are both unset, decoder_inputs_embeds takes the value of inputs_embeds.

  • use_cache (bool, optional) – See T5Model.

  • output_attentions (bool, optional) – See T5Model.

  • output_hidden_states (bool, optional) – See T5Model.

  • return_dict (bool, optional) – Whether or not to return a class:Seq2SeqLMOutput. If False, the output will be a tuple of tensors. Defaults to False.

Returns

An instance of Seq2SeqLMOutput if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of Seq2SeqLMOutput.

tuple: Returns tuple (loss, logits, cache, decoder_hidden_states, decoder_attentions, cross_attentions, encoder_last_hidden_state, encoder_hidden_states, encoder_attentions)

With the fields:

  • loss (Tensor):

    returned when labels is provided. Language modeling loss. It’s data type should be float32 and its shape is [1,].

  • logits (Tensor):

    Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). It’s data type should be float32 and its shape is [batch_size, sequence_length, vocab_size].

  • cache (List[tuple(Tensor, Tensor)], optional):

    See T5Model.

  • decoder_hidden_states (tuple(Tensor), optional)

    See T5Model.

  • decoder_attentions (tuple(Tensor), optional):

    See T5Model.

  • cross_attentions (tuple(Tensor), optional):

    See T5Model.

  • encoder_last_hidden_state (Tensor):

    See T5Model.

  • encoder_hidden_states (tuple(Tensor), optional):

    See T5Model.

  • encoder_attentions (tuple(Tensor), optional):

    See T5Model.

Example

import paddle
from paddlenlp.transformers import T5ForConditionalGeneration, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs, labels=inputs["input_ids"])

loss = output[0]
logits = output[1]
class T5EncoderModel(vocab_size=32128, d_model=768, d_kv=64, d_ff=3072, num_layers=12, num_heads=12, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, feed_forward_proj='relu', is_decoder: bool = False, **kwargs)[source]

Bases: paddlenlp.transformers.t5.modeling.T5PretrainedModel

base_model_class

alias of paddlenlp.transformers.t5.modeling.T5EncoderModel

get_input_embeddings()paddle.nn.layer.common.Embedding[source]

get input embedding of model

Returns

embedding of model

Return type

nn.Embedding

set_input_embeddings(new_embeddings: paddle.nn.layer.common.Embedding)None[source]

set new input embedding for model

Parameters

value (Embedding) – the new embedding of model

Raises

NotImplementedError – Model has not implement set_input_embeddings method

forward(input_ids: Optional[paddle.Tensor] = None, attention_mask: Optional[paddle.Tensor] = None, encoder_hidden_states: Optional[Tuple[paddle.Tensor]] = None, encoder_attention_mask: Optional[paddle.Tensor] = None, cache=None, inputs_embeds: Optional[paddle.Tensor] = None, use_cache: Optional[bool] = False, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = False)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments