modeling#
- class T5Model(config: T5Config)[source]#
Bases:
T5PretrainedModel
The bare T5 Model transformer outputting raw hidden-states without any specific head on top.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters:
(class (config) –
T5Config
): Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained
] method to load the model weights.
- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(new_embeddings)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddings
method
- forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]#
The T5Model forward method, overrides the
__call__()
special method.- Parameters:
input_ids (Tensor) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention on to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float. When the data type is int, the
masked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have0.0
values and the others have1.0
values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) – Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) – The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is [batch_size, sequence_length, hidden_size].hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].cache (Tuple[Tuple[Tensor]], optional) – Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model. Can be used to speed up sequential decoding. The
input_ids
which have their past given to this model should not be passed as input ids as they have already been computed. Defaults toNone
.inputs_embeds (Tensor, optional) – Optionally, instead of passing
input_ids
you can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size)
. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.decoder_inputs_embeds (Tensor, optional) –
Optionally, instead of passing
decoder_input_ids
you can choose to directly pass an embedded representation of shape(batch_size, target_sequence_length, hidden_size)
. Ifcache
is used, optionally only the lastdecoder_inputs_embeds
have to be input (seepast_key_values
). This is useful if you want more control over how to convertdecoder_input_ids
indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.If
decoder_input_ids
anddecoder_inputs_embeds
are both unset,decoder_inputs_embeds
takes the value ofinputs_embeds
.use_cache (bool, optional) – Whether or not to use cache. If set to
True
,past_buckets_states
states are returned and can be used to speed up decoding. Defaults toFalse
.output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. Defaults to
False
.output_hidden_states (bool, optional) – Whether or not to return the output of all hidden layers. Defaults to
False
.return_dict (bool, optional) – Whether or not to return a class:
Seq2SeqModelOutput
. IfFalse
, the output will be a tuple of tensors. Defaults toFalse
.
- Returns:
An instance of
Seq2SeqModelOutput
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqModelOutput
.tuple: Returns tuple (
last_hidden_state
,cache
,decoder_hidden_states
,decoder_attentions
,cross_attentions
,encoder_last_hidden_state
,encoder_hidden_states
,encoder_attentions
)With the fields:
last_hidden_state
(Tensor):Sequence of hidden-states at the last layer of the decoder of the model. It’s data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].
cache
(List[tuple(Tensor, Tensor)], optional):returned when
use_cache=True
is passed. List oftuple(Tensor, Tensor)
of lengthconfig["num_layers"]
, with the first element being the previousbuckets
of shape[batch_size, num_heads, num_hashes, sequence_length]
and the second being the previoushidden_states
of shape[batch_size, sequence_length, hidden_size]
.
decoder_hidden_states
(tuple(Tensor), optional)returned when
output_hidden_states=True
is passed. Tuple ofTensor
(one for the output of the embeddings + one for the output of decoder each layer) of shape(batch_size, sequence_length, hidden_size)
.
decoder_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
cross_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
encoder_last_hidden_state
(Tensor):Sequence of hidden-states at the last layer of the encoder of the model. It’s data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].
encoder_hidden_states
(tuple(Tensor), optional):returned when
output_hidden_states=True
is passed. tuple ofTensor
(one for the output of the embeddings + one for the output of encoder each layer). Each Tensor has a data type of float32 and its shape is [batch_size, sequence_length, hidden_size].
encoder_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
Example
import paddle from paddlenlp.transformers import T5Model, T5Tokenizer tokenizer = T5Tokenizer.from_pretrained('t5-base') model = T5Model.from_pretrained('t5-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") input_ids = paddle.to_tensor([inputs["input_ids"]], dtype="int64") decoder_inputs = tokenizer("It means you can") decoder_input_ids = paddle.to_tensor([decoder_inputs["input_ids"]], dtype="int64") outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) last_hidden_state = outputs[0] print(last_hidden_state.shape) # [1, 5, 768]
- class T5PretrainedModel(*args, **kwargs)[source]#
Bases:
PretrainedModel
An abstract class for pretrained T5 models. It provides T5 related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.- config_class#
alias of
T5Config
- class T5ForConditionalGeneration(config: T5Config)[source]#
Bases:
T5PretrainedModel
The T5 Model transformer with a language modeling head on top.
- Parameters:
config (
T5Config
) – An instance of T5Config used to construct T5ForConditionalGeneration.
- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(new_embeddings)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddings
method
- get_output_embeddings()[source]#
To be overwrited for models with output embeddings
- Returns:
the otuput embedding of model
- Return type:
Optional[Embedding]
- forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, labels=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]#
- Parameters:
input_ids (Tensor, optional) – See
T5Model
.attention_mask (Tensor, optional) – See
T5Model
.decoder_input_ids (Tensor, optional) – See
T5Model
.decoder_attention_mask (Tensor, optional) – See
T5Model
.encoder_output (tuple(Tensor), optional) – See
T5Model
.cache (List[tuple(Tensor, Tensor)], optional) – See
T5Model
.labels (Tensor, optional) – Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
labels = input_ids
Indices are selected in[-100, 0, ..., vocab_size]
All labels set to-100
are ignored (masked), the loss is only computed for labels in[0, ..., vocab_size]
. Shape is [batch_size, sequence_length] and dtype is int64.inputs_embeds (Tensor, optional) – Optionally, instead of passing
input_ids
you can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size)
. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.decoder_inputs_embeds (Tensor , optional) –
Optionally, instead of passing
decoder_input_ids
you can choose to directly pass an embedded representation of shape(batch_size, target_sequence_length, hidden_size)
. Ifpast_key_values
is used, optionally only the lastdecoder_inputs_embeds
have to be input (seepast_key_values
). This is useful if you want more control over how to convertdecoder_input_ids
indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.If
decoder_input_ids
anddecoder_inputs_embeds
are both unset,decoder_inputs_embeds
takes the value ofinputs_embeds
.use_cache (bool, optional) – See
T5Model
.output_attentions (bool, optional) – See
T5Model
.output_hidden_states (bool, optional) – See
T5Model
.return_dict (bool, optional) – Whether or not to return a class:
Seq2SeqLMOutput
. IfFalse
, the output will be a tuple of tensors. Defaults toFalse
.
- Returns:
An instance of
Seq2SeqLMOutput
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqLMOutput
.tuple: Returns tuple (
loss
,logits
,cache
,decoder_hidden_states
,decoder_attentions
,cross_attentions
,encoder_last_hidden_state
,encoder_hidden_states
,encoder_attentions
)With the fields:
loss
(Tensor):returned when
labels
is provided. Language modeling loss. It’s data type should be float32 and its shape is [1,].
logits
(Tensor):Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). It’s data type should be float32 and its shape is [batch_size, sequence_length, vocab_size].
cache
(List[tuple(Tensor, Tensor)], optional):See
T5Model
.
decoder_hidden_states
(tuple(Tensor), optional)See
T5Model
.
decoder_attentions
(tuple(Tensor), optional):See
T5Model
.
cross_attentions
(tuple(Tensor), optional):See
T5Model
.
encoder_last_hidden_state
(Tensor):See
T5Model
.
encoder_hidden_states
(tuple(Tensor), optional):See
T5Model
.
encoder_attentions
(tuple(Tensor), optional):See
T5Model
.
Example
import paddle from paddlenlp.transformers import T5ForConditionalGeneration, T5Tokenizer tokenizer = T5Tokenizer.from_pretrained('t5-base') model = T5ForConditionalGeneration.from_pretrained('t5-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs, labels=inputs["input_ids"]) loss = output[0] logits = output[1]
- class T5EncoderModel(config: T5Config)[source]#
Bases:
T5PretrainedModel
- base_model_class#
alias of
T5EncoderModel
- get_input_embeddings() Embedding [source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(new_embeddings: Embedding) None [source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddings
method
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attention_mask: Tensor | None = None, cache=None, inputs_embeds: Tensor | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[source]#
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters:
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments