modeling#

class MBartModel(config: MBartConfig)[源代码]#

基类:MBartPretrainedModel

The bare MBart Model transformer outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数:
  • Args --

  • config (MBartConfig) -- An instance of MBartConfig used to construct MBartModel.

get_input_embeddings()[源代码]#

get input embedding of model

返回:

embedding of model

返回类型:

nn.Embedding

set_input_embeddings(value)[源代码]#

set new input embedding for model

参数:

value (Embedding) -- the new embedding of model

抛出:

NotImplementedError -- Model has not implement set_input_embeddings method

forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#

The MBartModel forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor, optional) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be int64 and it has a shape of [batch_size, sequence_length]. Defaults to None, which means no decoder_input_ids is provided, the model will create the tensor by shifting the input_ids to the right.

  • decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in decoder_input_ids. Its data type and shape is the same as attention_mask. Defaults to None.

  • encoder_output (tuple, optional) -- The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size]. hidden_states is hidden_states of all layers in the Transformer encoder. The length of hidden_states is num_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size]. attentions is attentions of all layers of in the Transformer encoder. The length of attentions is num_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].

  • inputs_embeds (Tensor, optional) -- Optionally, instead of passing input_ids you can choose to directly pass an embedded representation of shape (batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convert input_ids indices into associated vectors than the model's internal embedding lookup matrix. Default to None.

  • decoder_inputs_embeds (Tensor, optional) -- Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation of shape (batch_size, target_sequence_length, hidden_size). If cache is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model's internal embedding lookup matrix. Default to None. If decoder_input_ids and decoder_inputs_embeds are both unset, decoder_inputs_embeds takes the value of inputs_embeds.

  • use_cache (bool, optional) -- Whether or not to use cache. Defaults to False. If set to True, key value states will be returned and can be used to speed up decoding.

  • cache (list, optional) -- It is a list, and each element in the list is a tuple (incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default to None.

  • output_attentions (bool, optional) -- Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail. Defaults to False.

  • output_hidden_states (bool, optional) -- Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail. Defaults to False.

  • return_dict (bool, optional) -- Whether to return a BaseModelOutputWithPastAndCrossAttentions object. If False, the output will be a tuple of tensors. Defaults to False.

返回:

An instance of BaseModelOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of BaseModelOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False, returns tensor decoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

示例

import paddle
from paddlenlp.transformers import MBartModel, MBartTokenizer

tokenizer = MBartTokenizer.from_pretrained('bart-base')
model = MBartModel.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs)
class MBartPretrainedModel(*args, **kwargs)[源代码]#

基类:PretrainedModel

An abstract class for pretrained MBart models. It provides MBart related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

config_class#

MBartConfig 的别名

base_model_class#

MBartModel 的别名

class MBartEncoder(config: MBartConfig, embed_tokens: Embedding | None = None)[源代码]#

基类:MBartPretrainedModel

The Transformer Encoder of MBartModel. The arguments of MBartEncoder can see MBartModel.

forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, **kwargs)[源代码]#

The MBartEncoder forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor, optional) -- See MBartModel.

  • attention_mask (Tensor, optional) -- See MBartModel.

  • input_embeds (Tensor, optional) -- See MBartModel.

  • output_attentions (bool, optional) -- See MBartModel.

  • output_hidden_states (bool, optional) -- See MBartModel.

  • return_dict (bool, optional) -- See MBartModel.

返回:

An instance of BaseModelOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of BaseModelOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False, returns tensor encoder_outputs which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

class MBartDecoder(config: MBartConfig, embed_tokens: Embedding | None = None)[源代码]#

基类:MBartPretrainedModel

The Transformer Decoder of MBartModel. The arguments of MBartDecoder can see MBartModel.

forward(decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, memory_mask: Tensor | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, decoder_inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#

The MBartDecoder forward method, overrides the __call__() special method.

参数:
  • decoder_input_ids (Tensor, optional) -- See MBartModel.

  • decoder_attention_mask (Tensor, optional) -- See MBartModel.

  • encoder_output (Tensor, optional) -- See MBartModel.

  • memory_mask (Tensor, optional) -- See MBartModel.

  • cache (Tensor, optional) -- See MBartModel.

  • decoder_inputs_embeds (Tensor, optional) -- See MBartModel.

  • output_attentions (bool, optional) -- See MBartModel.

  • output_hidden_states (bool, optional) -- See MBartModel.

  • return_dict (bool, optional) -- See MBartModel.

返回:

An instance of BaseModelOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of BaseModelOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False, returns tensor decoder_outputs which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

class MBartClassificationHead(input_dim: int, inner_dim: int, num_classes: int, pooler_dropout: float)[源代码]#

基类:Layer

Head for sentence-level classification tasks.

forward(hidden_states: Tensor)[源代码]#
参数:

hidden_states (Tensor) -- Hidden states of the classification model.

class MBartForSequenceClassification(config: MBartConfig)[源代码]#

基类:MBartPretrainedModel

MBart Model with a linear layer on top of the pooled output, designed for sequence classification/regression tasks like GLUE tasks.

参数:

config (MBartConfig) -- An instance of MBartConfig used to construct MBartForSequenceClassification.

forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#

The MBartForSequenceClassification forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor, optional) -- See MBartModel.

  • attention_mask (Tensor, optional) -- See MBartModel.

  • decoder_input_ids (Tensor, optional) -- See MBartModel.

  • decoder_attention_mask (Tensor, optional) -- See MBartModel.

  • encoder_output (Tensor, optonal) -- See MBartModel.

  • use_cache (bool, optional) -- See MBartModel.

  • cache (Tensor, optional) -- See MBartModel.

  • inputs_embeds (Tensor, optional) -- See MBartModel.

  • decoder_inputs_embeds (Tensor, optional) -- See MBartModel.

  • labels (Tensor, optional) -- Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., num_labels - 1]. If num_labels > 1 a classification loss is computed (Cross-Entropy). Default to None.

  • output_attentions (bool, optional) -- See MBartModel.

  • output_hidden_states (bool, optional) -- See MBartModel.

  • return_dict (bool, optional) -- See MBartModel.

返回:

An instance of :class:`~paddlenlp.transformers.model_outputs.Seq2SeqSequenceClassifierOutput if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of Seq2SeqSequenceClassifierOutput. Especially, When return_dict=output_hidden_states=output_attentions=False and labels=None, returns tensor logits, a tensor of the input text classification logits. Shape as [batch_size, num_labels] and dtype as float32.

示例

import paddle
from paddlenlp.transformers import MBartForSequenceClassification, MBartTokenizer

tokenizer = MBartTokenizer.from_pretrained('bart-base')
model = MBartForSequenceClassification.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
logits = model(**inputs)
class MBartForQuestionAnswering(config: MBartConfig)[源代码]#

基类:MBartPretrainedModel

MBart Model with a linear layer on top of the hidden-states output to compute span_start_logits and span_end_logits, designed for question-answering tasks like SQuAD.

参数:

config (MBartConfig) -- An instance of MBartConfig used to construct MBartForQuestionAnswering.

forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, start_positions: Tensor | None = None, end_positions: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#

The MBartForQuestionAnswering forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor, optional) -- See MBartModel.

  • attention_mask (Tensor, optional) -- See MBartModel.

  • decoder_input_ids (Tensor, optional) -- See MBartModel.

  • decoder_attention_mask (Tensor, optional) -- See MBartModel.

  • encoder_output (Tensor, optonal) -- See MBartModel.

  • inputs_embeds (Tensor, optional) -- See MBartModel.

  • decoder_inputs_embeds (Tensor, optional) -- See MBartModel.

  • use_cache (bool, optional) -- See MBartModel.

  • cache (Tensor, optional) -- See MBartModel.

  • start_positions (Tensor, optional) -- Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape (batch_size, ). Default to None.

  • end_positions (Tensor, optional) -- Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape (batch_size, ). Default to None.

  • output_attentions (bool, optional) -- See MBartModel.

  • output_hidden_states (bool, optional) -- See MBartModel.

  • return_dict (bool, optional) -- See MBartModel.

返回:

An instance of Seq2SeqQuestionAnsweringModelOutput if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of Seq2SeqQuestionAnsweringModelOutput. Especially, When return_dict=output_hidden_states=output_attentions=False and start_positions=end_positions=None, returns tuple (start_logits, end_logits).

With the fields:

  • start_logits (Tensor):

    A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • end_logits (Tensor):

    A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

示例

import paddle
from paddlenlp.transformers import MBartForQuestionAnswering, MBartTokenizer

tokenizer = MBartTokenizer.from_pretrained('bart-base')
model = MBartForQuestionAnswering.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
start_logits = outputs[0]
end_logits  =outputs[1]
class MBartForConditionalGeneration(config: MBartConfig)[源代码]#

基类:MBartPretrainedModel

MBart Model with a language modeling head on top.

参数:

config (MBartConfig) -- An instance of MBartConfig used to construct MBartForConditionalGeneration.

forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[源代码]#

The MBartForConditionalGeneration forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor, optional) -- See MBartModel.

  • attention_mask (Tensor, optional) -- See MBartModel.

  • decoder_input_ids (Tensor, optional) -- See MBartModel.

  • decoder_attention_mask (Tensor, optional) -- See MBartModel.

  • encoder_output (Tensor, optonal) -- See MBartModel. See MBartModel.

  • use_cache (bool, optional) -- See MBartModel.

  • cache (Tensor, optional) -- See MBartModel.

  • inputs_embeds (Tensor, optional) -- See MBartModel.

  • decoder_inputs_embeds (Tensor, optional) --

  • labels (Tensor, optional) -- Labels for computing the masked language modeling loss. Indices should either be in [0, ..., vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., vocab_size]. A tensor of shape (batch_size, sequence_length). Default to None.

  • output_attentions (bool, optional) -- See MBartModel.

  • output_hidden_states (bool, optional) -- See MBartModel.

  • return_dict (bool, optional) -- See MBartModel.

返回:

An instance of Seq2SeqLMOutput if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of Seq2SeqLMOutput. Especially, When use_cache=return_dict=output_hidden_states=output_attentions=False and labels=None, returns tensor logits, a tensor of the input text classification logits.

With the fields:

  • lm_logits (Tensor):

    The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].

  • cache (Tensor):

    See MBartModel.

示例

import paddle
from paddlenlp.transformers import MBartForConditionalGeneration, MBartTokenizer

tokenizer = MBartTokenizer.from_pretrained('bart-base')
model = MBartForConditionalGeneration.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)