modeling

Modeling classes for UnifiedTransformer model.

class UnifiedTransformerPretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained UnifiedTransformer models. It provides UnifiedTransformer related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

base_model_class

alias of paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerModel

class UnifiedTransformerModel(vocab_size, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, normalize_before=True, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, unk_token_id=0, pad_token_id=0, bos_token_id=1, eos_token_id=2, mask_token_id=30000)[源代码]

基类:paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerPretrainedModel

The bare UnifiedTransformer Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • vocab_size (int) -- Vocabulary size of inputs_ids in UnifiedTransformerModel. Also is the vocab size of token embedding matrix.

  • hidden_size (int, optional) -- Dimensionality of the embedding layers, encoder layers and pooler layer. Defaults to 768.

  • num_hidden_layers (int, optional) -- The number of hidden layers in the encoder. Defaults to 12.

  • num_attention_heads (int, optional) -- The number of heads in multi-head attention(MHA). Defaults to 12.

  • intermediate_size (int, optional) -- Dimensionality of the feed-forward layer in the encoder. Input tensors to feed-forward layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size. Defaults to 3072.

  • hidden_act (str, optional) -- The activation function in the feedforward network. Defaults to "gelu".

  • hidden_dropout_prob (float, optional) -- The dropout probability used in pre-process and post-precess of MHA and FFN sub-layer. Defaults to 0.1.

  • attention_probs_dropout_prob (float, optional) -- The dropout probability used in MHA to drop some attention target. Defaults to 0.1.

  • normalize_before (bool, optional) -- Indicate whether to put layer normalization into preprocessing of MHA and FFN sub-layers. If True, pre-process is layer normalization and post-precess includes dropout, residual connection. Otherwise, no pre-process and post-precess includes dropout, residual connection, layer normalization. Defaults to True.

  • max_position_embeddings (int, optional) -- The maximum length of input position_ids. Defaults to 512.

  • type_vocab_size (int, optional) -- The size of the input token_type_ids. Defaults to 2.

  • initializer_range (float, optional) --

    The standard deviation of the normal initializer. Defaults to 0.02.

    注解

    A normal_initializer initializes weight matrices as normal distributions. See UnifiedTransformerPretrainedModel.init_weights() method for how weights are initialized in UnifiedTransformerModel.

  • unk_token_id (int, optional) -- The id of special token unk_token. Defaults to 0.

  • pad_token_id (int, optional) -- The id of special token pad_token. Defaults to 0.

  • bos_token_id (int, optional) -- The id of special token bos_token. Defaults to 1.

  • eos_token_id (int, optional) -- The id of special token eos_token. Defaults to 2.

  • mask_token_id (int, optional) -- The id of special token mask_token. Defaults to 30000.

forward(input_ids, token_type_ids, position_ids, attention_mask, use_cache=False, cache=None)[源代码]

The UnifiedTransformerModel forward method, overrides the special __call__() method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • token_type_ids (Tensor) --

    Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • position_ids (Tensor) -- The position indices of input sequence tokens. It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor) --

    A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape broadcasted to [batch_size, n_head, sequence_length, sequence_length].

    • When the data type is bool, the unwanted positions have False values and the others have True values.

    • When the data type is int, the unwanted positions have 0 values and the others have 1 values.

    • When the data type is float, the unwanted positions have -INF values and the others have 0 values.

  • use_cache -- (bool, optional): Whether or not use the model cache to speed up decoding. Defaults to False.

  • cache (list, optional) -- It is a list, and each element in the list is incremental_cache produced by paddle.nn.TransformerEncoderLayer.gen_cache() method. See paddle.nn.TransformerEncoder.gen_cache() method for more details. It is only used for inference and should be None for training. Defaults to None.

返回

If use_cache is False, it is a tensor representing the output of UnifiedTransformerModel, with shape [batch_size, sequence_length, hidden_size]. The data type is float32 or float64. Otherwise, it is a tuple, besides the output of UnifiedTransformerModel, the tuple also includes the new cache which is same as input cache but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache() method and paddle.nn.MultiHeadAttention.forward() method for more details.

返回类型

Tensor|tuple

示例

from paddlenlp.transformers import UnifiedTransformerModel
from paddlenlp.transformers import UnifiedTransformerTokenizer

model = UnifiedTransformerModel.from_pretrained('plato-mini')
tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini')

history = '我爱祖国'
inputs = tokenizer.dialogue_encode(
    history,
    return_tensors=True,
    is_split_into_words=False)
outputs = model(**inputs)
class UnifiedTransformerLMHeadModel(unified_transformer)[源代码]

基类:paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerPretrainedModel

The UnifiedTransformer Model with a language modeling head on top for generation tasks.

参数

unified_transformer (UnifiedTransformerModel) -- An instance of UnifiedTransformerModel.

forward(input_ids, token_type_ids, position_ids, attention_mask, masked_positions=None, use_cache=False, cache=None)[源代码]

The UnifiedTransformerLMHeadModel forward method, overrides the special __call__() method.

参数
返回

If use_cache is False, it is a tensor representing the output of UnifiedTransformerLMHeadModel, with shape [batch_size, sequence_length, vocab_size]. The data type is float32 or float64. Otherwise, it is a tuple, besides the output of UnifiedTransformerLMHeadModel, the tuple also includes the new cache which is same as input cache but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache() method and paddle.nn.MultiHeadAttention.forward() method for more details.

返回类型

Tensor|tuple

示例

from paddlenlp.transformers import UnifiedTransformerLMHeadModel
from paddlenlp.transformers import UnifiedTransformerTokenizer

model = UnifiedTransformerLMHeadModel.from_pretrained('plato-mini')
tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini')

history = '我爱祖国'
inputs = tokenizer.dialogue_encode(
    history,
    return_tensors=True,
    is_split_into_words=False)
logits = model(**inputs)