modeling#
- class BartModel(config: BartConfig)[源代码]#
-
The bare Bart Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数:
config (
BartConfig) -- An instance of BartConfig used to construct BartModel.
- set_input_embeddings(value)[源代码]#
set new input embedding for model
- 参数:
value (Embedding) -- the new embedding of model
- 抛出:
NotImplementedError -- Model has not implement
set_input_embeddingsmethod
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | Seq2SeqModelOutput[源代码]#
The BartModel forward method, overrides the
__call__()special method.- 参数:
input_ids (Tensor, optional) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64and it has a shape of [batch_size, sequence_length].attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
maskedtokens haveFalsevalues and the others haveTruevalues. When the data type is int, themaskedtokens have0values and the others have1values. When the data type is float, themaskedtokens have-INFvalues and the others have0values. It is a tensor with shape broadcasted to[batch_size, encoder_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, encoder_attention_heads, sequence_length, sequence_length]. Defaults toNone, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64and it has a shape of [batch_size, sequence_length]. Defaults toNone, which means nodecoder_input_idsis provided, the model will create the tensor by shifting theinput_idsto the right.decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids. Its data type and shape is the same asattention_mask. Defaults toNone.encoder_output (tuple, optional) -- The output of the encoder, a tuple consists
last_hidden_state,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_stateis float32 and its shape is[batch_size, sequence_length, d_model].hidden_statesis hidden_states of all layers in the Transformer encoder. The length ofhidden_statesisnum_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, d_model].attentionsis attentions of all layers of in the Transformer encoder. The length ofattentionsisnum_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].inputs_embeds (Tensor, optional) -- Optionally, instead of passing
input_idsyou can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model's internal embedding lookup matrix. Default to None.decoder_inputs_embeds (Tensor, optional) -- Optionally, instead of passing
decoder_input_idsyou can choose to directly pass an embedded representation of shape(batch_size, target_sequence_length, hidden_size). Ifcacheis used, optionally only the lastdecoder_inputs_embedshave to be input (seepast_key_values). This is useful if you want more control over how to convertdecoder_input_idsindices into associated vectors than the model's internal embedding lookup matrix. Default to None. Ifdecoder_input_idsanddecoder_inputs_embedsare both unset,decoder_inputs_embedstakes the value ofinputs_embeds.use_cache (bool, optional) -- Whether or not to use cache. Defaults to
False. If set toTrue, key value states will be returned and can be used to speed up decoding.cache (list, optional) -- It is a list, and each element in the list is a tuple
(incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone.output_attentions (bool, optional) -- Whether or not to return the attentions tensors of all attention layers. See
attentionsunder returned tensors for more detail. Defaults toFalse.output_hidden_states (bool, optional) -- Whether or not to return the hidden states of all layers. See
hidden_statesunder returned tensors for more detail. Defaults toFalse.return_dict (bool, optional) -- Whether to return a
BaseModelOutputWithPastAndCrossAttentionsobject. IfFalse, the output will be a tuple of tensors. Defaults toFalse.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False, returns tensordecoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, d_model].
示例
import paddle from paddlenlp.transformers import BartModel, BartTokenizer tokenizer = BartTokenizer.from_pretrained('bart-base') model = BartModel.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs)
- class BartPretrainedModel(*args, **kwargs)[源代码]#
-
An abstract class for pretrained Bart models. It provides Bart related
model_config_file,pretrained_init_configuration,resource_files_names,pretrained_resource_files_map,base_model_prefixfor downloading and loading pretrained models. SeePretrainedModelfor more details.- config_class#
BartConfig的别名
- class BartEncoder(config: BartConfig, embed_tokens: Embedding | None = None)[源代码]#
-
The Transformer Encoder of BartModel. The arguments of BartEncoder can see
BartModel.- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, **kwargs) Tensor | Tuple | BaseModelOutputWithPastAndCrossAttentions[源代码]#
The BartEncoder forward method, overrides the
__call__()special method.- 参数:
input_ids (Tensor, optional) -- See
BartModel.attention_mask (Tensor, optional) -- See
BartModel.inputs_embeds (Tensor, optional) -- See
BartModel.output_attentions (bool, optional) -- See
BartModel.output_hidden_states (bool, optional) -- See
BartModel.return_dict (bool, optional) -- See
BartModel.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False, returns tensorencoder_outputswhich is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, d_model].
- class BartDecoder(config: BartConfig, embed_tokens: Embedding | None = None)[源代码]#
-
The Transformer Decoder of BartModel. The arguments of BartDecoder can see
BartModel.- forward(decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, memory_mask: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tensor | Tuple | BaseModelOutputWithPastAndCrossAttentions[源代码]#
The BartDecoder forward method, overrides the
__call__()special method.- 参数:
decoder_input_ids (Tensor, optional) -- See
BartModel.decoder_attention_mask (Tensor, optional) -- See
BartModel.encoder_output (Tensor, optional) -- See
BartModel.memory_mask (Tensor, optional) -- See
BartModel.decoder_inputs_embeds (Tensor, optional) -- See
BartModel.cache (Tensor, optional) -- See
BartModel.output_attentions (bool, optional) -- See
BartModel.output_hidden_states (bool, optional) -- See
BartModel.return_dict (bool, optional) -- See
BartModel.
- 返回:
An instance of
BaseModelOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False, returns tensordecoder_outputswhich is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, d_model].
- class BartClassificationHead(input_dim: int, inner_dim: int, num_classes: int, pooler_dropout: float)[源代码]#
基类:
LayerPerform sentence-level classification tasks.
- class BartForSequenceClassification(config: BartConfig)[源代码]#
-
Bart Model with a linear layer on top of the pooled output, designed for sequence classification/regression tasks like GLUE tasks.
- 参数:
config (
BartConfig) -- An instance of BartConfig used to construct BartForSequenceClassification.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tensor | Tuple | Seq2SeqSequenceClassifierOutput[源代码]#
The BartForSequenceClassification forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
BartModel.attention_mask (Tensor, optional) -- See
BartModel.decoder_input_ids (Tensor,
optional) -- SeeBartModel.decoder_attention_mask (Tensor, optional) -- See
BartModel.encoder_output (Tensor, optonal) -- See
BartModel.inputs_embeds (Tensor, optional) -- See
BartModel.decoder_inputs_embeds (Tensor, optional) -- See
BartModel.use_cache (bool, optional) -- See
BartModel. Forcely set toFalsewhenlabelsis provided that can save memory during training.cache (Tensor, optional) -- See
BartModel.labels (Tensor, optional) -- Labels for computing the sequence classification/regression loss. Indices should be in
[0, ..., num_labels - 1]. Ifnum_labels > 1a classification loss is computed (Cross-Entropy). Default toNone.output_attentions (bool, optional) -- See
BartModel.output_hidden_states (bool, optional) -- See
BartModel.return_dict (bool, optional) -- See
BartModel.
- 返回:
An instance of
Seq2SeqSequenceClassifierOutputifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqSequenceClassifierOutput. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseand labels=None, returns tensorlogits, a tensor of the input text classification logits. Shape as[batch_size, num_labels]and dtype as float32.
示例
import paddle from paddlenlp.transformers import BartForSequenceClassification, BartTokenizer tokenizer = BartTokenizer.from_pretrained('bart-base') model = BartForSequenceClassification.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} logits = model(**inputs)
- class BartForQuestionAnswering(config: BartConfig)[源代码]#
-
Bart Model with a linear layer on top of the hidden-states output to compute
span_start_logitsandspan_end_logits, designed for question-answering tasks like SQuAD.- 参数:
config (
BartConfig) -- An instance of BartConfig used to construct BartForQuestionAnswering.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, start_positions: Tensor | None = None, end_positions: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | Seq2SeqQuestionAnsweringModelOutput[源代码]#
The BartForQuestionAnswering forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
BartModel.attention_mask (Tensor, optional) -- See
BartModel.decoder_input_ids (Tensor,
optional) -- SeeBartModel.decoder_attention_mask (Tensor, optional) -- See
BartModel.encoder_output (Tensor, optonal) -- See
BartModel.inputs_embeds (Tensor, optional) -- See
BartModel.decoder_inputs_embeds (Tensor, optional) -- See
BartModel.use_cache (bool, optional) -- See
BartModel. Forcely set toFalsewhenstart_positionsandend_positionsare provided that can save memory during training.cache (Tensor, optional) -- See
BartModel.start_positions (Tensor, optional) -- Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape
(batch_size, ). Default toNone.end_positions (Tensor, optional) -- Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. A tensor of shape
(batch_size, ). Default toNone.output_attentions (bool, optional) -- See
BartModel.output_hidden_states (bool, optional) -- See
BartModel.return_dict (bool, optional) -- See
BartModel.
- 返回:
An instance of
Seq2SeqQuestionAnsweringModelOutputifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqQuestionAnsweringModelOutput. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseandstart_positions=end_positions=None, returns tuple (start_logits,end_logits).With the fields:
start_logits(Tensor):A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
end_logits(Tensor):A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
示例
import paddle from paddlenlp.transformers import BartForQuestionAnswering, BartTokenizer tokenizer = BartTokenizer.from_pretrained('bart-base') model = BartForQuestionAnswering.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) start_logits = outputs[0] end_logits =outputs[1]
- class BartForConditionalGeneration(config: BartConfig)[源代码]#
-
Bart Model with a
language modelinghead on top.- 参数:
config (
BartConfig) -- An instance of BartConfig used to construct BartForConditionalGeneration.
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, decoder_input_ids: Tensor | None = None, decoder_attention_mask: Tensor | None = None, encoder_output: Tuple[Tensor] | ModelOutput | None = None, inputs_embeds: Tensor | None = None, decoder_inputs_embeds: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Cache, StaticCache]] | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tensor | Tuple | Seq2SeqLMOutput[源代码]#
The BartForConditionalGeneration forward method, overrides the __call__() special method.
- 参数:
input_ids (Tensor, optional) -- See
BartModel.attention_mask (Tensor, optional) -- See
BartModel.decoder_input_ids (Tensor,
optional) -- SeeBartModel.decoder_attention_mask (Tensor, optional) -- See
BartModel.encoder_output (Tensor, optonal) -- See
BartModel.inputs_embeds (Tensor, optional) -- See
BartModel.decoder_inputs_embeds (Tensor, optional) -- See
BartModel.use_cache (bool, optional) -- See
BartModel.cache (Tensor, optional) -- See
BartModel.labels (Tensor, optional) -- Labels for computing the masked language modeling loss. Indices should either be in
[0, ..., vocab_size]or -100 (seeinput_idsdocstring). Tokens with indices set to-100are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., vocab_size]. A tensor of shape(batch_size, sequence_length). Default toNone.output_attentions (bool, optional) -- See
BartModel.output_hidden_states (bool, optional) -- See
BartModel.return_dict (bool, optional) -- See
BartModel.
- 返回:
An instance of
Seq2SeqLMOutputifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofSeq2SeqLMOutput. Especially, Whenuse_cache=return_dict=output_hidden_states=output_attentions=Falseand labels=None, returns tensorlogits, a tensor of the input text classification logits.With the fields:
lm_logits(Tensor):The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].
示例
import paddle from paddlenlp.transformers import BartForConditionalGeneration, BartTokenizer tokenizer = BartTokenizer.from_pretrained('bart-base') model = BartForConditionalGeneration.from_pretrained('bart-base') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs)