modeling#

Modeling classes for UNIMO model.

class UNIMOPretrainedModel(*args, **kwargs)[source]#

Bases: PretrainedModel

An abstract class for pretrained UNIMO models. It provides UNIMO related model_config_file, pretrained_init_configuration, resource_files_names, pretrained_resource_files_map, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

config_class#

alias of UNIMOConfig

base_model_class#

alias of UNIMOModel

class UNIMOModel(config: UNIMOConfig)[source]#

Bases: UNIMOPretrainedModel

The bare UNIMO Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

Parameters:

config (UNIMOConfig) – An instance of UNIMOConfig used to construct UNIMOModel.

get_input_embeddings()[source]#

get input embedding of model

Returns:

embedding of model

Return type:

nn.Embedding

set_input_embeddings(value)[source]#

set new input embedding for model

Parameters:

value (Embedding) – the new embedding of model

Raises:

NotImplementedError – Model has not implement set_input_embeddings method

forward(input_ids: Tensor | None = None, token_type_ids: Tensor | None = None, position_ids: Tensor | None = None, attention_mask: Tensor | None = None, use_cache: bool | None = None, cache: Tuple[Tensor] | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[source]#

The UNIMOModel forward method, overrides the special __call__() method.

Parameters:
  • input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It’s data type should be int64 and has a shape of [batch_size, sequence_length].

  • token_type_ids (Tensor) –

    Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    It’s data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.

  • position_ids (Tensor) – Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, max_position_embeddings - 1]. It’s data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None.

  • attention_mask (Tensor) – Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • use_cache – (bool, optional): Whether or not use the model cache to speed up decoding. Defaults to False.

  • cache (list, optional) – It is a list, and each element in the list is incremental_cache produced by paddle.nn.TransformerEncoderLayer.gen_cache() method. See paddle.nn.TransformerEncoder.gen_cache() method for more details. It is only used for inference and should be None for training. Defaults to None.

  • inputs_embeds (Tensor, optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation of shape (batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail. Defaults to False.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail. Defaults to False.

  • return_dict (bool, optional) – Whether to return a BaseModelOutputWithPastAndCrossAttentions object. If False, the output will be a tuple of tensors. Defaults to False.

Returns:

An instance of BaseModelOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of BaseModelOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False and cache=None, returns tensor Sequence_output of shape [batch_size, sequence_length, hidden_size], which is the output at the last layer of the model.

Example

from paddlenlp.transformers import UNIMOModel
from paddlenlp.transformers import UNIMOTokenizer

model = UNIMOModel.from_pretrained('unimo-text-1.0')
tokenizer = UNIMOTokenizer.from_pretrained('unimo-text-1.0')

inputs = tokenizer.gen_encode("Welcome to use PaddlePaddle and PaddleNLP!", return_tensors=True)
outputs = model(**inputs)
class UNIMOLMHeadModel(config: UNIMOConfig)[source]#

Bases: UNIMOPretrainedModel

The UNIMO Model with a language modeling head on top designed for generation tasks.

Parameters:

unimo (UNIMOModel) – An instance of UNIMOModel.

forward(input_ids: Tensor | None = None, token_type_ids: Tensor | None = None, position_ids: Tensor | None = None, attention_mask: Tensor | None = None, masked_positions: Tensor | None = None, use_cache: bool | None = None, cache: Tuple[Tensor] | None = None, inputs_embeds: Tensor | None = None, labels: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[source]#

The UNIMOLMHeadModel forward method, overrides the special __call__() method.

Parameters:
  • input_ids (Tensor, optional) – See UNIMOModel.

  • token_type_ids (Tensor) – See UNIMOModel.

  • position_ids (Tensor) – See UNIMOModel.

  • attention_mask (Tensor) – See UNIMOModel.

  • use_cache – (bool, optional): See UNIMOModel.

  • cache (list, optional) – See UNIMOModel.

  • inputs_embeds (Tensor, optional) – See UNIMOModel.

  • labels (Tensor, optional) – Labels for computing the left-to-right language modeling loss. Indices should be in [-100, 0, ..., vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels n [0, ..., vocab_size]

  • output_attentions (bool, optional) – See UNIMOModel.

  • output_hidden_states (bool, optional) – See UNIMOModel.

  • return_dict (bool, optional) – Whether to return a CausalLMOutputWithPastAndCrossAttentions object. If False, the output will be a tuple of tensors. Defaults to False.

Returns:

An instance of CausalLMOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of CausalLMOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False and cache=labels=None, returns tensor logits of shape [batch_size, sequence_length, hidden_size], which is the output at the last layer of the model.

Example

from paddlenlp.transformers import UNIMOLMHeadModel
from paddlenlp.transformers import UNIMOTokenizer

model = UNIMOLMHeadModel.from_pretrained('unimo-text-1.0')
tokenizer = UNIMOTokenizer.from_pretrained('unimo-text-1.0')

inputs = tokenizer.gen_encode(
    "Welcome to use PaddlePaddle and PaddleNLP!",
    return_tensors=True,
    is_split_into_words=False)
logits = model(**inputs)
UNIMOForMaskedLM#

alias of UNIMOLMHeadModel

UNIMOForConditionalGeneration#

alias of UNIMOLMHeadModel