modeling#
- class MBartModel(config: MBartConfig)[源代码]#
-
The bare MBart Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数:
Args
config (
MBartConfig
) -- An instance of MBartConfig used to construct MBartModel.
- set_input_embeddings(value)[源代码]#
set new input embedding for model
- 参数:
value (Embedding) -- the new embedding of model
- 抛出:
NotImplementedError -- Model has not implement
set_input_embeddings
method
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#
The MBartModel forward method, overrides the
__call__()
special method.- 参数:
input_ids (Tensor, optional) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
masked
tokens haveFalse
values and the others haveTrue
values. When the data type is int, themasked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have-INF
values and the others have0
values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) -- The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is[batch_size, sequence_length, hidden_size]
.hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size
].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length
].inputs_embeds (Tensor, optional) -- Optionally, instead of passing
input_ids
you can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size)
. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model's internal embedding lookup matrix. Default to None.decoder_inputs_embeds (Tensor, optional) -- Optionally, instead of passing
decoder_input_ids
you can choose to directly pass an embedded representation of shape(batch_size, target_sequence_length, hidden_size)
. Ifcache
is used, optionally only the lastdecoder_inputs_embeds
have to be input (seepast_key_values
). This is useful if you want more control over how to convertdecoder_input_ids
indices into associated vectors than the model's internal embedding lookup matrix. Default to None. Ifdecoder_input_ids
anddecoder_inputs_embeds
are both unset,decoder_inputs_embeds
takes the value ofinputs_embeds
.use_cache (bool, optional) -- Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) -- It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.output_attentions (bool, optional) -- Whether or not to return the attentions tensors of all attention layers. See
attentions
under returned tensors for more detail. Defaults toFalse
.output_hidden_states (bool, optional) -- Whether or not to return the hidden states of all layers. See
hidden_states
under returned tensors for more detail. Defaults toFalse
.return_dict (bool, optional) -- Whether to return a
BaseModelOutputWithPastAndCrossAttentions
object. IfFalse
, the output will be a tuple of tensors. Defaults toFalse
.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentions
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
, returns tensordecoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].
示例
import paddle from paddlenlp.transformers import MBartModel, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartModel.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs)
- class MBartPretrainedModel(*args, **kwargs)[源代码]#
-
An abstract class for pretrained MBart models. It provides MBart related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.- config_class#
MBartConfig
的别名
- base_model_class#
MBartModel
的别名
- class MBartEncoder(config: MBartConfig, embed_tokens: Embedding | None = None)[源代码]#
-
The Transformer Encoder of MBartModel. The arguments of MBartEncoder can see
MBartModel
.- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, **kwargs)[源代码]#
The MBartEncoder forward method, overrides the
__call__()
special method.- 参数:
input_ids (Tensor, optional) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.input_embeds (Tensor, optional) -- See
MBartModel
.output_attentions (bool, optional) -- See
MBartModel
.output_hidden_states (bool, optional) -- See
MBartModel
.return_dict (bool, optional) -- See
MBartModel
.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentions
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
, returns tensorencoder_outputs
which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].
- class MBartDecoder(config: MBartConfig, embed_tokens: Embedding | None = None)[源代码]#
-
The Transformer Decoder of MBartModel. The arguments of MBartDecoder can see
MBartModel
.- forward(decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, memory_mask: Tensor | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, decoder_inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#
The MBartDecoder forward method, overrides the
__call__()
special method.- 参数:
decoder_input_ids (Tensor, optional) -- See
MBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optional) -- See
MBartModel
.memory_mask (Tensor, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.decoder_inputs_embeds (Tensor, optional) -- See
MBartModel
.output_attentions (bool, optional) -- See
MBartModel
.output_hidden_states (bool, optional) -- See
MBartModel
.return_dict (bool, optional) -- See
MBartModel
.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentions
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
, returns tensordecoder_outputs
which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].
- class MBartClassificationHead(input_dim: int, inner_dim: int, num_classes: int, pooler_dropout: float)[源代码]#
基类:
Layer
Head for sentence-level classification tasks.
- class MBartForSequenceClassification(config: MBartConfig)[源代码]#
-
MBart Model with a linear layer on top of the pooled output, designed for sequence classification/regression tasks like GLUE tasks.
- 参数:
config (
MBartConfig
) -- An instance of MBartConfig used to construct MBartForSequenceClassification.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#
The MBartForSequenceClassification forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.inputs_embeds (Tensor, optional) -- See
MBartModel
.decoder_inputs_embeds (Tensor, optional) -- See
MBartModel
.labels (Tensor, optional) -- Labels for computing the sequence classification/regression loss. Indices should be in
[0, ..., num_labels - 1]
. Ifnum_labels > 1
a classification loss is computed (Cross-Entropy). Default toNone
.output_attentions (bool, optional) -- See
MBartModel
.output_hidden_states (bool, optional) -- See
MBartModel
.return_dict (bool, optional) -- See
MBartModel
.
- 返回:
An instance of :class:`~paddlenlp.transformers.model_outputs.Seq2SeqSequenceClassifierOutput
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqSequenceClassifierOutput
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
and labels=None, returns tensorlogits
, a tensor of the input text classification logits. Shape as[batch_size, num_labels]
and dtype as float32.
示例
import paddle from paddlenlp.transformers import MBartForSequenceClassification, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForSequenceClassification.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} logits = model(**inputs)
- class MBartForQuestionAnswering(config: MBartConfig)[源代码]#
-
MBart Model with a linear layer on top of the hidden-states output to compute
span_start_logits
andspan_end_logits
, designed for question-answering tasks like SQuAD.- 参数:
config (
MBartConfig
) -- An instance of MBartConfig used to construct MBartForQuestionAnswering.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, start_positions: Tensor | None = None, end_positions: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#
The MBartForQuestionAnswering forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
.inputs_embeds (Tensor, optional) -- See
MBartModel
.decoder_inputs_embeds (Tensor, optional) -- See
MBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.start_positions (Tensor, optional) -- Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape
(batch_size, )
. Default toNone
.end_positions (Tensor, optional) -- Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape
(batch_size, )
. Default toNone
.output_attentions (bool, optional) -- See
MBartModel
.output_hidden_states (bool, optional) -- See
MBartModel
.return_dict (bool, optional) -- See
MBartModel
.
- 返回:
An instance of
Seq2SeqQuestionAnsweringModelOutput
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqQuestionAnsweringModelOutput
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
andstart_positions=end_positions=None
, returns tuple (start_logits
,end_logits
).With the fields:
start_logits
(Tensor):A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
end_logits
(Tensor):A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
示例
import paddle from paddlenlp.transformers import MBartForQuestionAnswering, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForQuestionAnswering.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) start_logits = outputs[0] end_logits =outputs[1]
- class MBartForConditionalGeneration(config: MBartConfig)[源代码]#
-
MBart Model with a
language modeling
head on top.- 参数:
config (
MBartConfig
) -- An instance of MBartConfig used to construct MBartForConditionalGeneration.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#
The MBartForConditionalGeneration forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
MBartModel
.attention_mask (Tensor, optional) -- See
MBartModel
.decoder_input_ids (Tensor,
optional
) -- SeeMBartModel
.decoder_attention_mask (Tensor, optional) -- See
MBartModel
.encoder_output (Tensor, optonal) -- See
MBartModel
. SeeMBartModel
.use_cache (bool, optional) -- See
MBartModel
.cache (Tensor, optional) -- See
MBartModel
.inputs_embeds (Tensor, optional) -- See
MBartModel
.decoder_inputs_embeds (Tensor, optional)
labels (Tensor, optional) -- Labels for computing the masked language modeling loss. Indices should either be in
[0, ..., vocab_size]
or -100 (seeinput_ids
docstring). Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., vocab_size]
. A tensor of shape(batch_size, sequence_length)
. Default toNone
.output_attentions (bool, optional) -- See
MBartModel
.output_hidden_states (bool, optional) -- See
MBartModel
.return_dict (bool, optional) -- See
MBartModel
.
- 返回:
An instance of
Seq2SeqLMOutput
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqLMOutput
. Especially, Whenuse_cache=return_dict=output_hidden_states=output_attentions=False
and labels=None, returns tensorlogits
, a tensor of the input text classification logits.With the fields:
lm_logits
(Tensor):The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].
cache
(Tensor):See
MBartModel
.
示例
import paddle from paddlenlp.transformers import MBartForConditionalGeneration, MBartTokenizer tokenizer = MBartTokenizer.from_pretrained('bart-base') model = MBartForConditionalGeneration.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs)