model_outputs#
- tuple_output(outputs: Tuple[Tensor], loss: Tensor | None = None)[源代码]#
re-construct the outputs with one method which contains the simple logic
- 参数:
outputs (Tuple[Tensor]) -- the source of the outputs
loss (Optional[Tensor], optional) -- the loss of the model. Defaults to None.
- convert_encoder_output(encoder_output)[源代码]#
Convert encoder_output from tuple to class:
BaseModelOutput
.- 参数:
encoder_output (tuple or ModelOutput) -- The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is [batch_size, sequence_length, hidden_size].
- class ModelOutput[源代码]#
Base class for all model outputs as dataclass. Has a
__getitem__
that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore theNone
attributes. Otherwise behaves like a regular python dictionary.<Tip warning={true}>
You can't unpack a
ModelOutput
directly. Use the [to_tuple
] method to convert it to a tuple before.</Tip>
- setdefault(*args, **kwargs)[源代码]#
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- pop(key[, default]) v, remove specified key and return the corresponding value. [源代码]#
If the key is not found, return the default if given; otherwise, raise a KeyError.
- class BaseModelOutput(last_hidden_state: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs, with potential hidden states and attentions.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) -- Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class BaseModelOutputWithNoAttention(last_hidden_state: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs, with potential hidden states.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, num_channels, height, width)
) -- Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, num_channels, height, width)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
- class BaseModelOutputWithPooling(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs that also contains a pooling of the last hidden states.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) -- Sequence of hidden-states at the output of the last layer of the model.pooler_output (
paddle.Tensor
of shape(batch_size, hidden_size)
) -- Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class BaseModelOutputWithPast(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding).
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) --Sequence of hidden-states at the output of the last layer of the model.
If
past_key_values
is used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size)
is output.past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class BaseModelOutputWithPastAndCrossAttentions(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding).
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) --Sequence of hidden-states at the output of the last layer of the model.
If
past_key_values
is used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size)
is output.past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
- class BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs that also contains a pooling of the last hidden states.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) -- Sequence of hidden-states at the output of the last layer of the model.pooler_output (
paddle.Tensor
of shape(batch_size, hidden_size)
) -- Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.
- class SequenceClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of sentence classification models.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Classification (or regression if config.num_labels==1) loss.logits (
paddle.Tensor
of shape(batch_size, config.num_labels)
) -- Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class TokenClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of token classification models.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Classification loss.logits (
paddle.Tensor
of shape(batch_size, sequence_length, config.num_labels)
) -- Classification scores (before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class QuestionAnsweringModelOutput(loss: Tensor | None = None, start_logits: Tensor | None = None, end_logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of question answering models.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.start_logits (
paddle.Tensor
of shape(batch_size, sequence_length)
) -- Span-start scores (before SoftMax).end_logits (
paddle.Tensor
of shape(batch_size, sequence_length)
) -- Span-end scores (before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class MultipleChoiceModelOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of multiple choice models.
- 参数:
loss (
paddle.Tensor
of shape (1,), optional, returned whenlabels
is provided) -- Classification loss.logits (
paddle.Tensor
of shape(batch_size, num_choices)
) --num_choices is the second dimension of the input tensors. (see input_ids above).
Classification scores (before SoftMax).
hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class MaskedLMOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for masked language models outputs.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Masked language modeling (MLM) loss.logits (
paddle.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class CausalLMOutputWithPast(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for causal language model (or autoregressive) outputs.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Language modeling loss (for next-token prediction).logits (
paddle.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
paddle.Tensor
tuples of lengthconfig.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant ifconfig.is_decoder = True
.Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class CausalLMOutputWithCrossAttentions(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for causal language model (or autoregressive) outputs.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Language modeling loss (for next-token prediction).logits (
paddle.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
paddle.Tensor
tuples of lengthconfig.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant ifconfig.is_decoder = True
.Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.
- class Seq2SeqModelOutput(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for model encoder's outputs that also contains : pre-computed hidden states that can speed up sequential decoding.
- 参数:
last_hidden_state (
paddle.Tensor
) --Sequence of hidden-states at the output of the last layer of the decoder of the model, whose shape is
(batch_size, Sequence_length, hidden_size)
.If
past_key_values
is used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size)
is output.past_key_values (
tuple(tuple(paddle.Tensor))
, optional) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
. Returned whenuse_cache=True
is passed or whenconfig.use_cache=True
.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
.Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
decoder_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
paddle.Tensor
, optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model whose shape is(batch_size, sequence_length, hidden_size)
,encoder_hidden_states (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
.Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
encoder_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- class Seq2SeqLMOutput(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for sequence-to-sequence language models outputs.
- 参数:
loss (
paddle.Tensor
, optional) -- Language modeling loss whose shape is(1,)
. Returned whenlabels
is provided.logits (
paddle.Tensor
) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) whose shape is(batch_size, sequence_length, config.vocab_size)
).past_key_values (
tuple(tuple(paddle.Tensor))
, optional) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
. Returned whenuse_cache=True
is passed or whenconfig.use_cache=True
.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
paddle.Tensor
, optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model whose shape is(batch_size, sequence_length, hidden_size)
.encoder_hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(paddle.Tensor)
, optional) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- class Seq2SeqQuestionAnsweringModelOutput(loss: Tensor | None = None, start_logits: Tensor | None = None, end_logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of sequence-to-sequence question answering models. :param loss: Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
A Tensor of shape
(1,)
, returned whenlabels
is provided.- 参数:
start_logits (
paddle.Tensor
) -- Span-start scores (before SoftMax). Tensor of shape(batch_size, sequence_length)
).end_logits (
paddle.Tensor
) -- Span-end scores (before SoftMax). Tensor of shape(batch_size, sequence_length)
).past_key_values (
tuple(tuple(paddle.Tensor))
, optional) -- Tuple oftuple(paddle.Tensor)
of lengthn_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
. Returned whenuse_cache=True
is passed. Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed. Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.decoder_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.cross_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.encoder_last_hidden_state (
paddle.Tensor
optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. Tensor of shape(batch_size, sequence_length, hidden_size)
.encoder_hidden_states (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed. Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.encoder_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- class Seq2SeqSequenceClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of sequence-to-sequence sentence classification models. :param loss: Classification (or regression if config.num_labels==1) loss of shape
(1,)
. Returned whenlabel
is provided). :type loss:paddle.Tensor
optional :param logits: Classification (or regression if config.num_labels==1) scores (before SoftMax) of shape(batch_size, config.num_labels)
:type logits:paddle.Tensor
:param past_key_values: Tuple oftuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
. Returned whenuse_cache=True
is passed. Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.- 参数:
decoder_hidden_states (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed. Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.decoder_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.cross_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.encoder_last_hidden_state (
paddle.Tensor
, optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. Tensor of shape(batch_size, sequence_length, hidden_size)
.encoder_hidden_states (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed. Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.encoder_attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed. Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- class SequenceClassifierOutputWithPast(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of sentence classification models. :param loss: Classification (or regression if config.num_labels==1) loss whose shape is
(1,)
.Returned when
labels
is provided.- 参数:
logits (
paddle.Tensor
) -- Classification (or regression if config.num_labels==1) scores (before SoftMax) whose shape is(batch_size, num_labels)
past_key_values (
tuple(tuple(paddle.Tensor))
, optional) -- Tuple oftuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) Returned whenuse_cache=True
is passed or whenconfig.use_cache=True
). Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
). Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.attentions (
tuple(paddle.Tensor)
, optional) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
). Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class BackboneOutput(feature_maps: Tuple[Tensor] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of backbones.
- 参数:
feature_maps (
tuple(paddle.Tensor)
of shape(batch_size, num_channels, height, width)
) -- Feature maps of the stages.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
or(batch_size, num_channels, height, width)
, depending on the backbone.Hidden-states of the model at the output of each stage plus the initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. Only applicable if the backbone uses attention.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class BaseModelOutputWithPoolingAndNoAttention(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs that also contains a pooling of the last hidden states.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, num_channels, height, width)
) -- Sequence of hidden-states at the output of the last layer of the model.pooler_output (
paddle.Tensor
of shape(batch_size, hidden_size)
) -- Last layer hidden-state after a pooling operation on the spatial dimensions.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, num_channels, height, width)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
- class ImageClassifierOutputWithNoAttention(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of image classification models.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Classification (or regression if config.num_labels==1) loss.logits (
paddle.Tensor
of shape(batch_size, config.num_labels)
) -- Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape(batch_size, num_channels, height, width)
. Hidden-states (also called feature maps) of the model at the output of each stage.
- class DepthEstimatorOutput(loss: Tensor | None = None, predicted_depth: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of depth estimation models.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Classification (or regression if config.num_labels==1) loss.predicted_depth (
paddle.Tensor
of shape(batch_size, height, width)
) -- Predicted depth for each pixel.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, num_channels, height, width)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class SemanticSegmenterOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for outputs of semantic segmentation models. :param loss: Classification (or regression if config.num_labels==1) loss. :type loss:
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided :param logits: Classification scores for each pixel.<Tip warning={true}> The logits returned do not necessarily have the same size as the
pixel_values
passed as inputs. This is to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the original image size as post-processing. You should always check your logits shape and resize as needed. </Tip>- 参数:
hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) -- Tuple ofpaddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, patch_size, hidden_size)
. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) -- Tuple ofpaddle.Tensor
(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length)
. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- class Seq2SeqSpectrogramOutput(loss: Tensor | None = None, spectrogram: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[源代码]#
Base class for sequence-to-sequence spectrogram outputs.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Spectrogram generation loss.spectrogram (
paddle.Tensor
of shape(batch_size, sequence_length, num_bins)
) -- The predicted spectrogram.past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- class MoEModelOutputWithPast(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, router_logits: Tuple[Tensor] | None = None)[源代码]#
Base class for model's outputs, with potential hidden states and attentions.
- 参数:
last_hidden_state (
paddle.Tensor
of shape(batch_size, sequence_length, hidden_size)
) -- Sequence of hidden-states at the output of the last layer of the model.past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
router_logits (
tuple(paddle.Tensor)
, optional, returned whenoutput_router_probs=True
andconfig.add_router_probs=True
is passed or whenconfig.output_router_probs=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, sequence_length, num_experts)
.Raw router logtis (post-softmax) that are computed by MoE routers, these terms are used to compute the auxiliary loss for Mixture of Experts models.
- class MoECausalLMOutputWithPast(loss: Tensor | None = None, aux_loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, router_logits: Tuple[Tensor] | None = None)[源代码]#
Base class for causal language model (or autoregressive) with mixture of experts outputs.
- 参数:
loss (
paddle.Tensor
of shape(1,)
, optional, returned whenlabels
is provided) -- Language modeling loss (for next-token prediction).logits (
paddle.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).aux_loss (
paddle.Tensor
, optional, returned whenlabels
is provided) -- aux_loss for the sparse modules.router_logits (
tuple(paddle.Tensor)
, optional, returned whenoutput_router_probs=True
andconfig.add_router_probs=True
is passed or whenconfig.output_router_probs=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, sequence_length, num_experts)
.Raw router logtis (post-softmax) that are computed by MoE routers, these terms are used to compute the auxiliary loss for Mixture of Experts models.
past_key_values (
tuple(tuple(paddle.Tensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) --Tuple of
tuple(paddle.Tensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
)Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
past_key_values
input) to speed up sequential decoding.hidden_states (
tuple(paddle.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) --Tuple of
paddle.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(paddle.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) --Tuple of
paddle.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.