modeling

class BartModel(vocab_size, bos_token_id=0, pad_token_id=1, eos_token_id=2, forced_eos_token_id=2, decoder_start_token_id=2, d_model=768, num_encoder_layers=6, num_decoder_layers=6, encoder_attention_heads=12, decoder_attention_heads=12, encoder_ffn_dim=3072, decoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

The bare Bart Model transformer outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • vocab_size (int) -- Vocabulary size of inputs_ids in BartModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling BartModel.

  • bos_token (int, optional) -- The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. Defaults to 0.

  • pad_token_id (int, optional) -- The index of padding token in the token vocabulary. Defaults to 1.

  • eos_token (int, optional) -- A special token representing the end of a sequence that was used during pretraining. Defaults to 2.

  • d_model (int, optional) -- Dimensionality of the embedding layer, encoder layer and decoder layer. Defaults to 768.

  • num_encoder_layers (int, optional) -- Number of hidden layers in the Transformer encoder. Defaults to 6.

  • num_decoder_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to 6.

  • encoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder. Defaults to 12.

  • decoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer decoder. Defaults to 12.

  • encoder_ffn_dim (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from d_model to encoder_ffn_dim, and then projected back to d_model. Typically encoder_ffn_dim is larger than d_model. Defaults to 3072.

  • decoder_ffn_dim (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from d_model to decoder_ffn_dim, and then projected back to d_model. Typically decoder_ffn_dim is larger than d_model. Defaults to 3072.

  • dropout (float, optional) -- The dropout probability used in all fully connected layers (pre-process and post-process of MHA and FFN sub-layer) in the encoders and decoders. Defaults to 0.1.

  • activation_function (str, optional) -- The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu".

  • attention_dropout (float, optional) -- The dropout probability used in MultiHeadAttention in all encoder layers and decoder layers to drop some attention target. Defaults to 0.1.

  • activation_dropout (float, optional) -- The dropout probability used after FFN activation in all encoder layers and decoder layers. Defaults to 0.1.

  • max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to 1024.

  • init_std (float, optional) -- The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Default to 0.02.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]

The BartModel forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be int64 and it has a shape of [batch_size, sequence_length]. Defaults to None, which means no decoder_input_ids is provided, the model will create the tensor by shifting the input_ids to the right.

  • decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in decoder_input_ids. Its data type and shape is the same as attention_mask. Defaults to None.

  • encoder_output (tuple, optional) -- The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size]. hidden_states is hidden_states of all layers in the Transformer encoder. The length of hidden_states is num_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size]. attentions is attentions of all layers of in the Transformer encoder. The length of attentions is num_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].

  • use_cache (bool, optional) -- Whether or not to use cache. Defaults to False. If set to True, key value states will be returned and can be used to speed up decoding.

  • cache (list, optional) -- It is a list, and each element in the list is a tuple (incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default to None.

返回

Returns tensor decoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import BartModel, BartTokenizer

tokenizer = BartTokenizer.from_pretrained('bart-base')
model = BartModel.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs)
class BartPretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained Bart models. It provides Bart related model_config_file, pretrained_init_configuration, resource_files_names, pretrained_resource_files_map, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

init_weights(layer)[源代码]

Initialization hook

base_model_class

alias of paddlenlp.transformers.bart.modeling.BartModel

class BartEncoder(embed_tokens, vocab_size, pad_token_id=1, d_model=768, num_encoder_layers=6, encoder_attention_heads=12, encoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

The Transformer Encoder of BartModel. The arguments of BartEncoder can see BartModel.

forward(input_ids=None, attention_mask=None, **kwargs)[源代码]

The BartEncoder forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor, optional) -- See BartModel.

  • attention_mask (Tensor, optional) -- See BartModel.

返回

Returns tensor encoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

返回类型

Tensor

class BartDecoder(embed_tokens, vocab_size, pad_token_id=1, d_model=768, num_decoder_layers=6, decoder_attention_heads=12, decoder_ffn_dim=3072, dropout=0.1, activation_function='gelu', attention_dropout=0.1, activation_dropout=0.1, max_position_embeddings=1024, init_std=0.02)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

The Transformer Decoder of BartModel. The arguments of BartDecoder can see BartModel.

forward(decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, memory_mask=None, cache=None)[源代码]

The BartDecoder forward method, overrides the __call__() special method.

参数
  • decoder_input_ids (Tensor, optional) -- See BartModel.

  • decoder_attention_mask (Tensor, optional) -- See BartModel.

  • encoder_output (Tensor, optional) -- See BartModel.

  • memory_mask (Tensor, optional) -- See BartModel.

  • cache (Tensor, optional) -- See BartModel.

返回

Returns tensor decoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

返回类型

Tensor

class BartClassificationHead(input_dim: int, inner_dim: int, num_classes: int, pooler_dropout: float)[源代码]

基类:paddle.fluid.dygraph.layers.Layer

Perform sentence-level classification tasks.

forward(hidden_states)[源代码]
参数

hidden_states (Tensor) -- Hidden states of the classification model.

class BartForSequenceClassification(bart, num_labels=2, dropout=None)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

Bart Model with a linear layer on top of the pooled output, designed for sequence classification/regression tasks like GLUE tasks.

参数
  • bart (BartModel) -- An instance of BartModel.

  • num_labels (int, optional) -- The number of different labels. Defaults to 2.

  • dropout (float, optional) -- The dropout probability for output of Bart. If None, use the same value as hidden_dropout_prob of BartModel instance bart. Defaults to None.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]

The BartForSequenceClassification forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- See BartModel.

  • attention_mask (Tensor, optional) -- See BartModel.

  • decoder_input_ids (Tensor, optional) -- See BartModel.

  • decoder_attention_mask (Tensor, optional) -- See BartModel.

  • encoder_output (Tensor, optonal) -- See BartModel.

  • use_cache (bool, optional) -- See BartModel.

  • cache (Tensor, optional) -- See BartModel.

返回

Returns tensor logits, a tensor of the input text classification logits. Shape as [batch_size, num_labels] and dtype as float32.

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import BartForSequenceClassification, BartTokenizer

tokenizer = BartTokenizer.from_pretrained('bart-base')
model = BartForSequenceClassification.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
logits = model(**inputs)
class BartForQuestionAnswering(bart)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

Bart Model with a linear layer on top of the hidden-states output to compute span_start_logits and span_end_logits, designed for question-answering tasks like SQuAD.

参数

bart (BartModel) -- An instance of BartModel.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]

The BartForQuestionAnswering forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- See BartModel.

  • attention_mask (Tensor, optional) -- See BartModel.

  • decoder_input_ids (Tensor, optional) -- See BartModel.

  • decoder_attention_mask (Tensor, optional) -- See BartModel.

  • encoder_output (Tensor, optonal) -- See BartModel.

  • use_cache (bool, optional) -- See BartModel.

  • cache (Tensor, optional) -- See BartModel.

返回

Returns tuple (start_logits, end_logits).

With the fields:

  • start_logits (Tensor):

    A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • end_logits (Tensor):

    A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

返回类型

tuple

示例

import paddle
from paddlenlp.transformers import BartForQuestionAnswering, BartTokenizer

tokenizer = BartTokenizer.from_pretrained('bart-base')
model = BartForQuestionAnswering.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
start_logits = outputs[0]
end_logits  =outputs[1]
class BartForConditionalGeneration(bart)[源代码]

基类:paddlenlp.transformers.bart.modeling.BartPretrainedModel

Bart Model with a language modeling head on top.

参数

bart (BartModel) -- An instance of BartModel.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]

The BartForConditionalGeneration forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- See BartModel.

  • attention_mask (Tensor, optional) -- See BartModel.

  • decoder_input_ids (Tensor, optional) -- See BartModel.

  • decoder_attention_mask (Tensor, optional) -- See BartModel.

  • encoder_output (Tensor, optonal) -- See BartModel.

  • use_cache (bool, optional) -- See BartModel.

  • cache (Tensor, optional) -- See BartModel.

返回

Returns Tensor lm_logits if use_cache is False, otherwise, returns tuple (lm_logits, cache).

With the fields:

  • lm_logits (Tensor):

    The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].

  • cache (Tensor):

    See BartModel.

返回类型

Tensor or tuple

示例

import paddle
from paddlenlp.transformers import BartForConditionalGeneration, BartTokenizer

tokenizer = BartTokenizer.from_pretrained('bart-base')
model = BartForConditionalGeneration.from_pretrained('bart-base')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)