modeling¶
-
class
T5Model
(tie_word_embeddings=True, pad_token_id=0, bos_token_id=0, eos_token_id=1, initializer_factor=1.0, vocab_size=32128, d_model=768, d_kv=64, d_ff=3072, num_layers=12, num_decoder_layers=12, num_heads=12, relative_attention_num_buckets=32, dropout_rate=0.1, layer_norm_epsilon=1e-06, feed_forward_proj='relu')[源代码]¶ 基类:
paddlenlp.transformers.t5.modeling.T5PretrainedModel
The bare T5 Model transformer outputting raw hidden-states without any specific head on top.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数
tie_word_embeddings (bool, optional) -- Whether to tie input and output embeddings. Defaults to
False
.pad_token_id (int, optional) -- The id of the
padding
token. Defaults to0
.bos_token_id (int, optional) -- The id of the
bos
token. Defaults to0
.eos_token_id (int, optional) -- The id of the
eos
token. Defaults to1
.initializer_factor (float, optional) -- A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing). Defaults to
1.0
.vocab_size (int, optional) -- Vocabulary size of
inputs_ids
inT5Model
. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingT5Model
. Defaults to32128
.d_model (int, optional) -- Dimensionality of the embedding layer, encoder layer. Defaults to
768
.d_kv (int, optional) -- Size of the key, query, value projections per attention head. Defaults to
64
.d_ff (int, optional) -- Dimensionality of the feed_forward layer in the residual attention block. Defaults to
3072
.num_layers (int, optional) -- Number of hidden layers in the Transformer encoder. Defaults to
12
.num_decoder_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to
12
.num_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder and decoder. Defaults to
12
.relative_attention_num_buckets (int, optional) -- The number of buckets to use for each attention layer. Defaults to
32
.dropout_rate (float, optional) -- The dropout ratio for all layers. Defaults to
0.1
.layer_norm_eps (float, optional) -- The epsilon used by the layer normalization layers. Defaults to
1e-6
.feed_forward_proj (str, optional) -- The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string,
"relu"
,"gated-gelu"
are supported. Defaults to"relu"
.feed_forward_proj -- The non-linear activation function (function or string) in the feed forward layer in the residual attention block. If string,
"relu"
,"gated-gelu"
are supported. Defaults to"relu"
.
-
forward
(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, use_cache=True, output_attentions=False, output_hidden_states=False)[源代码]¶ The T5Model forward method, overrides the
__call__()
special method.- 参数
input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention on to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float. When the data type is int, the
masked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have0.0
values and the others have1.0
values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) -- The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is [batch_size, sequence_length, hidden_size].hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].cache (Tuple[Tuple[Tensor]], optional) -- Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model. Can be used to speed up sequential decoding. The
input_ids
which have their past given to this model should not be passed as input ids as they have already been computed. Defaults toNone
.use_cache (bool, optional) -- Whether or not to use cache. If set to
True
,past_buckets_states
states are returned and can be used to speed up decoding. Defaults toFalse
.output_attentions (bool, optional) -- Whether or not to return the attentions tensors of all attention layers. Defaults to
False
.output_hidden_states (bool, optional) -- Whether or not to return the output of all hidden layers. Defaults to
False
.
- 返回
Returns tuple (
last_hidden_state
,cache
,decoder_hidden_states
,decoder_attentions
,cross_attentions
,encoder_last_hidden_state
,encoder_hidden_states
,encoder_attentions
)With the fields:
last_hidden_state
(Tensor):Sequence of hidden-states at the last layer of the decoder of the model. It's data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].
cache
(List[tuple(Tensor, Tensor)], optional):returned when
use_cache=True
is passed. List oftuple(Tensor, Tensor)
of lengthconfig["num_layers"]
, with the first element being the previousbuckets
of shape[batch_size, num_heads, num_hashes, sequence_length]
and the second being the previoushidden_states
of shape[batch_size, sequence_length, hidden_size]
.
decoder_hidden_states
(tuple(Tensor), optional)returned when
output_hidden_states=True
is passed. Tuple ofTensor
(one for the output of the embeddings + one for the output of decoder each layer) of shape(batch_size, sequence_length, hidden_size)
.
decoder_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
cross_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
encoder_last_hidden_state
(Tensor):Sequence of hidden-states at the last layer of the encoder of the model. It's data type should be float32 and its shape is [batch_size, sequence_length, hidden_size].
encoder_hidden_states
(tuple(Tensor), optional):returned when
output_hidden_states=True
is passed. tuple ofTensor
(one for the output of the embeddings + one for the output of encoder each layer). Each Tensor has a data type of float32 and its shape is [batch_size, sequence_length, hidden_size].
encoder_attentions
(tuple(Tensor), optional):returned when
output_attentions=True
is passed. tuple ofTensor
(one for each layer) of shape. Each Tensor has a data type of float32 and its shape is [batch_size, num_heads, sequence_length, sequence_length].
- 返回类型
tuple
示例
import paddle from paddlenlp.transformers import T5Model, T5Tokenizer tokenizer = T5Tokenizer.from_pretrained('t5-base') model = T5Model.from_pretrained('t5-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") input_ids = paddle.to_tensor([inputs["input_ids"]], dtype="int64") decoder_inputs = tokenizer("It means you can") decoder_input_ids = paddle.to_tensor([decoder_inputs["input_ids"]], dtype="int64") outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) last_hidden_state = outputs[0] print(last_hidden_state.shape) # [1, 5, 768]
-
class
T5PretrainedModel
(name_scope=None, dtype='float32')[源代码]¶ 基类:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained T5 models. It provides T5 related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶
-
-
class
T5ForConditionalGeneration
(t5)[源代码]¶ 基类:
paddlenlp.transformers.t5.modeling.T5PretrainedModel
The T5 Model transformer with a language modeling head on top.
-
forward
(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, cache=None, labels=None, use_cache=True, output_attentions=False, output_hidden_states=False)[源代码]¶ - 参数
input_ids (Tensor, optional) -- See
T5Model
.attention_mask (Tensor, optional) -- See
T5Model
.decoder_input_ids (Tensor, optional) -- See
T5Model
.decoder_attention_mask (Tensor, optional) -- See
T5Model
.encoder_output (tuple(Tensor), optional) -- See
T5Model
.cache (List[tuple(Tensor, Tensor)], optional) -- See
T5Model
.labels (Tensor, optional) -- Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
labels = input_ids
Indices are selected in[-100, 0, ..., vocab_size]
All labels set to-100
are ignored (masked), the loss is only computed for labels in[0, ..., vocab_size]
. Shape is [batch_size, sequence_length] and dtype is int64.use_cache (bool, optional) -- See
T5Model
.output_attentions (bool, optional) -- See
T5Model
.output_hidden_states (bool, optional) -- See
T5Model
.
- 返回
Returns tuple (
loss
,logits
,cache
,decoder_hidden_states
,decoder_attentions
,cross_attentions
,encoder_last_hidden_state
,encoder_hidden_states
,encoder_attentions
)With the fields:
loss
(Tensor):returned when
labels
is provided. Language modeling loss. It's data type should be float32 and its shape is [1,].
logits
(Tensor):Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). It's data type should be float32 and its shape is [batch_size, sequence_length, vocab_size].
cache
(List[tuple(Tensor, Tensor)], optional):See
T5Model
.
decoder_hidden_states
(tuple(Tensor), optional)See
T5Model
.
decoder_attentions
(tuple(Tensor), optional):See
T5Model
.
cross_attentions
(tuple(Tensor), optional):See
T5Model
.
encoder_last_hidden_state
(Tensor):See
T5Model
.
encoder_hidden_states
(tuple(Tensor), optional):See
T5Model
.
encoder_attentions
(tuple(Tensor), optional):See
T5Model
.
- 返回类型
tuple
示例
import paddle from paddlenlp.transformers import T5ForConditionalGeneration, T5Tokenizer tokenizer = T5Tokenizer.from_pretrained('t5-base') model = T5ForConditionalGeneration.from_pretrained('t5-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs, labels=inputs["input_ids"]) loss = output[0] logits = output[1]
-