modeling

Modeling classes for UNIMO model.

class UNIMOPretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained UNIMO models. It provides UNIMO related model_config_file, pretrained_init_configuration, resource_files_names, pretrained_resource_files_map, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

base_model_class

alias of paddlenlp.transformers.unimo.modeling.UNIMOModel

class UNIMOModel(vocab_size, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='relu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, normalize_before=False, max_position_embeddings=513, type_vocab_size=4, initializer_range=0.02, unk_token_id=17963, pad_token_id=0, bos_token_id=1, eos_token_id=3, mask_token_id=3)[源代码]

基类:paddlenlp.transformers.unimo.modeling.UNIMOPretrainedModel

The bare UNIMO Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • vocab_size (int) -- Vocabulary size of inputs_ids in UNIMOModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling UNIMOModel.

  • hidden_size (int, optional) -- Dimensionality of the embedding layers and encoder layers. Defaults to 768.

  • num_hidden_layers (int, optional) -- The number of hidden layers in the Transformer encoder. Defaults to 12.

  • num_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder. Defaults to 12.

  • intermediate_size (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size. Defaults to 3072.

  • hidden_act (str, optional) -- The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu".

  • hidden_dropout_prob (float, optional) -- The dropout probability used in pre-process and post-precess of MHA and FFN sub-layer. Defaults to 0.1.

  • attention_probs_dropout_prob (float, optional) -- The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to 0.1.

  • normalize_before (bool, optional) -- Indicate whether to put layer normalization into preprocessing of MHA and FFN sub-layers. If True, pre-process is layer normalization and post-precess includes dropout, residual connection. Otherwise, no pre-process and post-precess includes dropout, residual connection, layer normalization. Defaults to True.

  • max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to 512.

  • type_vocab_size (int, optional) -- The vocabulary size of the token_type_ids passed when calling UNIMOModel. Defaults to 2.

  • initializer_range (float, optional) --

    The standard deviation of the normal initializer. Defaults to 0.02.

    注解

    A normal_initializer initializes weight matrices as normal distributions. See UNIMOPretrainedModel._init_weights() for how weights are initialized in UNIMOModel.

  • unk_token_id (int, optional) -- A special token representing the unknown (out-of-vocabulary) token. An unknown token is set to be unk_token in order to be converted to an ID. Defaults to 17963.

  • pad_token_id (int, optional) -- A special token used to make arrays of tokens the same size for batching purposes. Defaults to 0.

  • bos_token_id (int, optional) -- A special token representing the beginning of a sequence that was used during pretraining. Defaults to 1.

  • eos_token_id (int, optional) -- A special token representing the end of a sequence that was used during pretraining. Defaults to 3.

  • mask_token_id (int, optional) -- A special token representing a masked token. This is the token used in the masked language modeling task which the model tries to predict the original unmasked ones. Defaults to 3.

forward(input_ids, token_type_ids, position_ids, attention_mask, use_cache=False, cache=None)[源代码]

The UNIMOModel forward method, overrides the special __call__() method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • token_type_ids (Tensor) --

    Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    It's data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.

  • position_ids (Tensor) -- Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, max_position_embeddings - 1]. It's data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None.

  • attention_mask (Tensor) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • use_cache -- (bool, optional): Whether or not use the model cache to speed up decoding. Defaults to False.

  • cache (list, optional) -- It is a list, and each element in the list is incremental_cache produced by paddle.nn.TransformerEncoderLayer.gen_cache() method. See paddle.nn.TransformerEncoder.gen_cache() method for more details. It is only used for inference and should be None for training. Defaults to None.

返回

If use_cache is False, it is a tensor representing the output of UNIMOModel, with shape [batch_size, sequence_length, hidden_size]. The data type is float64. Otherwise, it is a tuple, besides the output of UNIMOModel, the tuple also includes the new cache which is same as input cache but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache() method and paddle.nn.MultiHeadAttention.forward() method for more details.

返回类型

Tensor or tuple

示例

from paddlenlp.transformers import UNIMOModel
from paddlenlp.transformers import UNIMOTokenizer

model = UNIMOModel.from_pretrained('unimo-text-1.0')
tokenizer = UNIMOTokenizer.from_pretrained('unimo-text-1.0')

inputs = tokenizer.gen_encode("Welcome to use PaddlePaddle and PaddleNLP!", return_tensors=True)
outputs = model(**inputs)
class UNIMOLMHeadModel(unimo)[源代码]

基类:paddlenlp.transformers.unimo.modeling.UNIMOPretrainedModel

The UNIMO Model with a language modeling head on top designed for generation tasks.

参数

unimo (UNIMOModel) -- An instance of UNIMOModel.

forward(input_ids, token_type_ids, position_ids, attention_mask, masked_positions=None, use_cache=False, cache=None)[源代码]

The UNIMOLMHeadModel forward method, overrides the special __call__() method.

参数
返回

If use_cache is False, it is a tensor representing the output of UNIMOModel, with shape [batch_size, sequence_length, hidden_size]. The data type is float64. Otherwise, it is a tuple, besides the output of UNIMOLMHeadModel, the tuple also includes the new cache which is same as input cache but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache() method and paddle.nn.MultiHeadAttention.forward() method for more details.

返回类型

Tensor or tuple

示例

from paddlenlp.transformers import UNIMOLMHeadModel
from paddlenlp.transformers import UNIMOTokenizer

model = UNIMOLMHeadModel.from_pretrained('unimo-text-1.0')
tokenizer = UNIMOTokenizer.from_pretrained('unimo-text-1.0')

inputs = tokenizer.gen_encode(
    "Welcome to use PaddlePaddle and PaddleNLP!",
    return_tensors=True,
    is_split_into_words=False)
logits = model(**inputs)