modeling¶
-
class
MBartModel
(vocab_size, bos_token_id=0, pad_token_id=1, eos_token_id=2, decoder_start_token_id=2, forced_bos_token_id=250004, d_model=768, num_encoder_layers=6, num_decoder_layers=6, encoder_attention_heads=12, decoder_attention_heads=12, encoder_ffn_dim=3072, decoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
The bare MBart Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数
vocab_size (int) -- Vocabulary size of
inputs_ids
inMBartModel
. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingMBartModel
.bos_token (int, optional) -- The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. Defaults to
0
.pad_token_id (int, optional) -- The index of padding token in the token vocabulary. Defaults to
1
.eos_token (int, optional) -- A special token representing the end of a sequence that was used during pretraining. Defaults to
2
.d_model (int, optional) -- Dimensionality of the embedding layer, encoder layer and decoder layer. Defaults to
768
.num_encoder_layers (int, optional) -- Number of hidden layers in the Transformer encoder. Defaults to
6
.num_decoder_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to
6
.encoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder. Defaults to
12
.decoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer decoder. Defaults to
12
.encoder_ffn_dim (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from
d_model
toencoder_ffn_dim
, and then projected back tod_model
. Typicallyencoder_ffn_dim
is larger thand_model
. Defaults to3072
.decoder_ffn_dim (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from
d_model
todecoder_ffn_dim
, and then projected back tod_model
. Typicallydecoder_ffn_dim
is larger thand_model
. Defaults to3072
.dropout (float, optional) -- The dropout probability used in all fully connected layers (pre-process and post-process of MHA and FFN sub-layer) in the encoders and decoders. Defaults to
0.1
.activation_function (str, optional) -- The non-linear activation function in the feed-forward layer.
"gelu"
,"relu"
and any other paddle supported activation functions are supported. Defaults to"gelu"
.attention_dropout (float, optional) -- The dropout probability used in MultiHeadAttention in all encoder layers and decoder layers to drop some attention target. Defaults to
0.1
.activation_dropout (float, optional) -- The dropout probability used after FFN activation in all encoder layers and decoder layers. Defaults to
0.1
.max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to
1024
.init_std (float, optional) -- The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Default to
0.02
.
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The MBartModel forward method, overrides the
__call__()
special method.- 参数
input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
masked
tokens haveFalse
values and the others haveTrue
values. When the data type is int, themasked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have-INF
values and the others have0
values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) -- The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is[batch_size, sequence_length, hidden_size]
.hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size
].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length
].use_cache (bool, optional) -- Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) -- It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.
- 返回
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
示例
import paddle from paddlenlp.transformers import MBartModel, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartModel.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs)
-
class
MBartPretrainedModel
(name_scope=None, dtype='float32')[源代码]¶ 基类:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained MBart models. It provides MBart related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶
-
-
class
MBartEncoder
(embed_tokens, vocab_size, pad_token_id=1, d_model=768, num_encoder_layers=6, encoder_attention_heads=12, encoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
The Transformer Encoder of MBartModel. The arguments of MBartEncoder can see
MBartModel
.-
forward
(input_ids=None, attention_mask=None, **kwargs)[源代码]¶ The MBartEncoder forward method, overrides the
__call__()
special method.- 参数
input_ids (Tensor, optional) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.
- 返回
Returns tensor
encoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
-
-
class
MBartDecoder
(embed_tokens, vocab_size, pad_token_id=1, d_model=768, num_decoder_layers=6, decoder_attention_heads=12, decoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
The Transformer Decoder of MBartModel. The arguments of MBartDecoder can see
MBartModel
.-
forward
(decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, memory_mask=None, cache=None)[源代码]¶ The MBartDecoder forward method, overrides the
__call__()
special method.- 参数
decoder_input_ids (Tensor, optional) -- See
MBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optional) -- See
MBartModel
.memory_mask (Tensor, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.
- 返回
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
-
-
class
MBartClassificationHead
(input_dim: int, inner_dim: int, num_classes: int, pooler_dropout: float)[源代码]¶ 基类:
paddle.fluid.dygraph.layers.Layer
Head for sentence-level classification tasks.
-
class
MBartForSequenceClassification
(mbart, num_labels=2, dropout=None)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
MBart Model with a linear layer on top of the pooled output, designed for sequence classification/regression tasks like GLUE tasks.
- 参数
mbart (
MBartModel
) -- An instance of MBartModel.num_labels (int, optional) -- The number of different labels. Defaults to
2
.dropout (float, optional) -- The dropout probability for output of MBart. If None, use the same value as
hidden_dropout_prob
ofMBartModel
instancembart
. Defaults to None.
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The MBartForSequenceClassification forward method, overrides the __call__() special method.
- 参数
input_ids (Tensor) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.
- 返回
Returns tensor
logits
, a tensor of the input text classification logits. Shape as[batch_size, num_labels]
and dtype as float32.- 返回类型
Tensor
示例
import paddle from paddlenlp.transformers import MBartForSequenceClassification, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForSequenceClassification.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} logits = model(**inputs)
-
class
MBartForQuestionAnswering
(mbart)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
MBart Model with a linear layer on top of the hidden-states output to compute
span_start_logits
andspan_end_logits
, designed for question-answering tasks like SQuAD.- 参数
mbart (
MBartModel
) -- An instance of MBartModel.
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The MBartForQuestionAnswering forward method, overrides the __call__() special method.
- 参数
input_ids (Tensor) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.
- 返回
Returns tuple (
start_logits
,end_logits
).With the fields:
start_logits
(Tensor):A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
end_logits
(Tensor):A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
- 返回类型
tuple
示例
import paddle from paddlenlp.transformers import MBartForQuestionAnswering, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForQuestionAnswering.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) start_logits = outputs[0] end_logits =outputs[1]
-
class
MBartForConditionalGeneration
(mbart)[源代码]¶ 基类:
paddlenlp.transformers.mbart.modeling.MBartPretrainedModel
MBart Model with a linear layer on top of the hidden-states output to compute
span_start_logits
andspan_end_logits
, designed for question-answering tasks like SQuAD .- 参数
mbart (
MBartModel
) -- An instance of MBartModel.
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The MBartForConditionalGeneration forward method, overrides the __call__() special method.
- 参数
input_ids (Tensor) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.
- 返回
Returns Tensor
lm_logits
ifuse_cache
isFalse
, otherwise, returns tuple (lm_logits
,cache
).With the fields:
lm_logits
(Tensor):The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].
cache
(Tensor):See
MBartModel
.
- 返回类型
Tensor or tuple
示例
import paddle from paddlenlp.transformers import MBartForConditionalGeneration, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForConditionalGeneration.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs)