model_outputs

model_outputs#

tuple_output(outputs: Tuple[Tensor], loss: Tensor | None = None)[source]#

re-construct the outputs with one method which contains the simple logic

Parameters:

outputs (Tuple[Tensor]) – the source of the outputs
loss (Optional[Tensor], optional) – the loss of the model. Defaults to None.

convert_encoder_output(encoder_output)[source]#

Convert encoder_output from tuple to class:BaseModelOutput.

Parameters:: encoder_output (tuple or ModelOutput) – The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size].

class ModelOutput[source]#

Base class for all model outputs as dataclass. Has a __getitem__ that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes. Otherwise behaves like a regular python dictionary.

You can’t unpack a ModelOutput directly. Use the [to_tuple] method to convert it to a tuple before.

</Tip>

setdefault(*args, **kwargs)[source]#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

pop(k[, d]) → v, remove specified key and return the corresponding[source]#: value. If key is not found, d is returned if given, otherwise KeyError is raised.

update([E, ]**F) → None. Update D from dict/iterable E and F.[source]#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

to_tuple() → Tuple[Any][source]#: Convert self to a tuple containing all the attributes/keys that are not None.

class BaseModelOutput(last_hidden_state: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs, with potential hidden states and attentions.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class BaseModelOutputWithNoAttention(last_hidden_state: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs, with potential hidden states.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, num_channels, height, width)) – Sequence of hidden-states at the output of the last layer of the model.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

class BaseModelOutputWithPooling(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs that also contains a pooling of the last hidden states.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.
pooler_output (paddle.Tensor of shape (batch_size, hidden_size)) – Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class BaseModelOutputWithPast(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) –
Sequence of hidden-states at the output of the last layer of the model.

If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and optionally if config.is_encoder_decoder=True 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class BaseModelOutputWithPastAndCrossAttentions(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) –
Sequence of hidden-states at the output of the last layer of the model.

If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and optionally if config.is_encoder_decoder=True 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True and config.add_cross_attention=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.

class BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs that also contains a pooling of the last hidden states.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.
pooler_output (paddle.Tensor of shape (batch_size, hidden_size)) – Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True and config.add_cross_attention=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and optionally if config.is_encoder_decoder=True 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

class SequenceClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of sentence classification models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.
logits (paddle.Tensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class TokenClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of token classification models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Classification loss.
logits (paddle.Tensor of shape (batch_size, sequence_length, config.num_labels)) – Classification scores (before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Base class for outputs of question answering models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
start_logits (paddle.Tensor of shape (batch_size, sequence_length)) – Span-start scores (before SoftMax).
end_logits (paddle.Tensor of shape (batch_size, sequence_length)) – Span-end scores (before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class MultipleChoiceModelOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of multiple choice models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Classification loss.
logits (paddle.Tensor of shape (batch_size, num_choices)) –
num_choices is the second dimension of the input tensors. (see input_ids above).

Classification scores (before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class MaskedLMOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for masked language models outputs.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Masked language modeling (MLM) loss.
logits (paddle.Tensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Base class for causal language model (or autoregressive) outputs.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Language modeling loss (for next-token prediction).
logits (paddle.Tensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of paddle.Tensor tuples of length config.n_layers, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant if config.is_decoder = True.

Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class CausalLMOutputWithCrossAttentions(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None)[source]#

Base class for causal language model (or autoregressive) outputs.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Language modeling loss (for next-token prediction).
logits (paddle.Tensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of paddle.Tensor tuples of length config.n_layers, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant if config.is_decoder = True.

Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

class Seq2SeqModelOutput(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[source]#

Base class for model encoder’s outputs that also contains : pre-computed hidden states that can speed up sequential decoding.

Parameters:

last_hidden_state (paddle.Tensor) –
Sequence of hidden-states at the output of the last layer of the decoder of the model, whose shape is (batch_size, Sequence_length, hidden_size).

If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (tuple(tuple(paddle.Tensor)), optional) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). Returned when use_cache=True is passed or when config.use_cache=True.

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
decoder_hidden_states (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed or when config.output_hidden_states=True.

Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
decoder_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (paddle.Tensor, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model whose shape is (batch_size, sequence_length, hidden_size),
encoder_hidden_states (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed or when config.output_hidden_states=True.

Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
encoder_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

class Seq2SeqLMOutput(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[source]#

Base class for sequence-to-sequence language models outputs.

Parameters:

loss (paddle.Tensor, optional) – Language modeling loss whose shape is (1,). Returned when labels is provided.
logits (paddle.Tensor) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) whose shape is (batch_size, sequence_length, config.vocab_size)).
past_key_values (tuple(tuple(paddle.Tensor)), optional) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). Returned when use_cache=True is passed or when config.use_cache=True.

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
decoder_hidden_states (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed or when config.output_hidden_states=True.

Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (paddle.Tensor, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model whose shape is (batch_size, sequence_length, hidden_size).
encoder_hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (tuple(paddle.Tensor), optional) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True.

Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

class Seq2SeqQuestionAnsweringModelOutput(loss: Tensor | None = None, start_logits: Tensor | None = None, end_logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of sequence-to-sequence question answering models. :param loss: Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.

A Tensor of shape (1,), returned when labels is provided.

Parameters:

start_logits (paddle.Tensor) – Span-start scores (before SoftMax). Tensor of shape (batch_size, sequence_length)).
end_logits (paddle.Tensor) – Span-end scores (before SoftMax). Tensor of shape (batch_size, sequence_length)).
past_key_values (tuple(tuple(paddle.Tensor)), optional) – Tuple of tuple(paddle.Tensor) of length n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). Returned when use_cache=True is passed. Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
decoder_hidden_states (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed. Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (paddle.Tensor optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model. Tensor of shape (batch_size, sequence_length, hidden_size).
encoder_hidden_states (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed. Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

class Seq2SeqSequenceClassifierOutput(loss: Tensor | None = None, logits: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of sequence-to-sequence sentence classification models. :param loss: Classification (or regression if config.num_labels==1) loss of shape (1,). Returned when label is provided). :type loss: paddle.Tensor optional :param logits: Classification (or regression if config.num_labels==1) scores (before SoftMax) of shape (batch_size, config.num_labels) :type logits: paddle.Tensor :param past_key_values: Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape

(batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). Returned when use_cache=True is passed. Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

Parameters:

decoder_hidden_states (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed. Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (paddle.Tensor, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model. Tensor of shape (batch_size, sequence_length, hidden_size).
encoder_hidden_states (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed. Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed. Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

Base class for outputs of sentence classification models. :param loss: Classification (or regression if config.num_labels==1) loss whose shape is (1,).

Returned when labels is provided.

Parameters:

logits (paddle.Tensor) – Classification (or regression if config.num_labels==1) scores (before SoftMax) whose shape is (batch_size, num_labels)
past_key_values (tuple(tuple(paddle.Tensor)), optional) – Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) Returned when use_cache=True is passed or when config.use_cache=True). Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Returned when output_hidden_states=True is passed or when config.output_hidden_states=True). Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Returned when output_attentions=True is passed or when config.output_attentions=True). Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class BackboneOutput(feature_maps: Tuple[Tensor] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of backbones.

Parameters:

feature_maps (tuple(paddle.Tensor) of shape (batch_size, num_channels, height, width)) – Feature maps of the stages.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size) or (batch_size, num_channels, height, width), depending on the backbone.

Hidden-states of the model at the output of each stage plus the initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Only applicable if the backbone uses attention.

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class BaseModelOutputWithPoolingAndNoAttention(last_hidden_state: Tensor | None = None, pooler_output: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs that also contains a pooling of the last hidden states.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, num_channels, height, width)) – Sequence of hidden-states at the output of the last layer of the model.
pooler_output (paddle.Tensor of shape (batch_size, hidden_size)) – Last layer hidden-state after a pooling operation on the spatial dimensions.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

class ImageClassifierOutputWithNoAttention(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None)[source]#

Base class for outputs of image classification models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.
logits (paddle.Tensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape (batch_size, num_channels, height, width). Hidden-states (also called feature maps) of the model at the output of each stage.

class DepthEstimatorOutput(loss: Tensor | None = None, predicted_depth: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of depth estimation models.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.
predicted_depth (paddle.Tensor of shape (batch_size, height, width)) – Predicted depth for each pixel.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, patch_size, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class SemanticSegmenterOutput(loss: Tensor | None = None, logits: Tensor | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None)[source]#

Base class for outputs of semantic segmentation models. :param loss: Classification (or regression if config.num_labels==1) loss. :type loss: paddle.Tensor of shape (1,), optional, returned when labels is provided :param logits: Classification scores for each pixel.

<Tip warning={true}> The logits returned do not necessarily have the same size as the pixel_values passed as inputs. This is to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the original image size as post-processing. You should always check your logits shape and resize as needed. </Tip>

Parameters:

hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, patch_size, hidden_size). Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, patch_size, sequence_length). Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

class Seq2SeqSpectrogramOutput(loss: Tensor | None = None, spectrogram: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, decoder_hidden_states: Tuple[Tensor] | None = None, decoder_attentions: Tuple[Tensor] | None = None, cross_attentions: Tuple[Tensor] | None = None, encoder_last_hidden_state: Tensor | None = None, encoder_hidden_states: Tuple[Tensor] | None = None, encoder_attentions: Tuple[Tensor] | None = None)[source]#

Base class for sequence-to-sequence spectrogram outputs.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Spectrogram generation loss.
spectrogram (paddle.Tensor of shape (batch_size, sequence_length, num_bins)) – The predicted spectrogram.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
decoder_hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.
encoder_hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

class MoEModelOutputWithPast(last_hidden_state: Tensor | None = None, past_key_values: Tuple[Tuple[Tensor]] | None = None, hidden_states: Tuple[Tensor] | None = None, attentions: Tuple[Tensor] | None = None, router_logits: Tuple[Tensor] | None = None)[source]#

Base class for model’s outputs, with potential hidden states and attentions.

Parameters:

last_hidden_state (paddle.Tensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and optionally if config.is_encoder_decoder=True 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
router_logits (tuple(paddle.Tensor), optional, returned when output_router_probs=True and config.add_router_probs=True is passed or when config.output_router_probs=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, sequence_length, num_experts).

Raw router logtis (post-softmax) that are computed by MoE routers, these terms are used to compute the auxiliary loss for Mixture of Experts models.

Base class for causal language model (or autoregressive) with mixture of experts outputs.

Parameters:

loss (paddle.Tensor of shape (1,), optional, returned when labels is provided) – Language modeling loss (for next-token prediction).
logits (paddle.Tensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
aux_loss (paddle.Tensor, optional, returned when labels is provided) – aux_loss for the sparse modules.
router_logits (tuple(paddle.Tensor), optional, returned when output_router_probs=True and config.add_router_probs=True is passed or when config.output_router_probs=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, sequence_length, num_experts).

Raw router logtis (post-softmax) that are computed by MoE routers, these terms are used to compute the auxiliary loss for Mixture of Experts models.
past_key_values (tuple(tuple(paddle.Tensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) –
Tuple of tuple(paddle.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head))

Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.
hidden_states (tuple(paddle.Tensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) –
Tuple of paddle.Tensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(paddle.Tensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) –
Tuple of paddle.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

model_outputs

Contents

model_outputs#