modeling¶
Modeling classes for UnifiedTransformer model.
-
class
UnifiedTransformerPretrainedModel
(name_scope=None, dtype='float32')[source]¶ Bases:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained UnifiedTransformer models. It provides UnifiedTransformer related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶ alias of
paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerModel
-
-
class
UnifiedTransformerModel
(vocab_size, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, normalize_before=True, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, unk_token_id=0, pad_token_id=0, bos_token_id=1, eos_token_id=2, mask_token_id=30000, role_type_size=None)[source]¶ Bases:
paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerPretrainedModel
The bare UnifiedTransformer Model outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters
vocab_size (int) – Vocabulary size of
inputs_ids
inUnifiedTransformerModel
. Also is the vocab size of token embedding matrix.hidden_size (int, optional) – Dimensionality of the embedding layers, encoder layers and pooler layer. Defaults to 768.
num_hidden_layers (int, optional) – The number of hidden layers in the encoder. Defaults to 12.
num_attention_heads (int, optional) – The number of heads in multi-head attention(MHA). Defaults to 12.
intermediate_size (int, optional) – Dimensionality of the feed-forward layer in the encoder. Input tensors to feed-forward layers are firstly projected from
hidden_size
tointermediate_size
, and then projected back tohidden_size
. Typicallyintermediate_size
is larger thanhidden_size
. Defaults to 3072.hidden_act (str, optional) – The activation function in the feedforward network. Defaults to “gelu”.
hidden_dropout_prob (float, optional) – The dropout probability used in pre-process and post-precess of MHA and FFN sub-layer. Defaults to 0.1.
attention_probs_dropout_prob (float, optional) – The dropout probability used in MHA to drop some attention target. Defaults to 0.1.
normalize_before (bool, optional) – Indicate whether to put layer normalization into preprocessing of MHA and FFN sub-layers. If True, pre-process is layer normalization and post-precess includes dropout, residual connection. Otherwise, no pre-process and post-precess includes dropout, residual connection, layer normalization. Defaults to True.
max_position_embeddings (int, optional) – The maximum length of input
position_ids
. Defaults to 512.type_vocab_size (int, optional) – The size of the input
token_type_ids
. Defaults to 2.initializer_range (float, optional) –
The standard deviation of the normal initializer. Defaults to 0.02.
Note
A normal_initializer initializes weight matrices as normal distributions. See
UnifiedTransformerPretrainedModel.init_weights()
method for how weights are initialized inUnifiedTransformerModel
.unk_token_id (int, optional) – The id of special token
unk_token
. Defaults to 0.pad_token_id (int, optional) – The id of special token
pad_token
. Defaults to 0.bos_token_id (int, optional) – The id of special token
bos_token
. Defaults to 1.eos_token_id (int, optional) – The id of special token
eos_token
. Defaults to 2.mask_token_id (int, optional) – The id of special token
mask_token
. Defaults to 30000.
-
forward
(input_ids, token_type_ids, position_ids, attention_mask, use_cache=False, cache=None, role_ids=None)[source]¶ The UnifiedTransformerModel forward method, overrides the special
__call__()
method.- Parameters
input_ids (Tensor) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It’s data type should be
int64
and has a shape of [batch_size, sequence_length].token_type_ids (Tensor) –
Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
It’s data type should be
int64
and has a shape of [batch_size, sequence_length].position_ids (Tensor) – The position indices of input sequence tokens. It’s data type should be
int64
and has a shape of [batch_size, sequence_length].attention_mask (Tensor) –
A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape broadcasted to [batch_size, n_head, sequence_length, sequence_length].
When the data type is bool, the unwanted positions have
False
values and the others haveTrue
values.When the data type is int, the unwanted positions have 0 values and the others have 1 values.
When the data type is float, the unwanted positions have
-INF
values and the others have 0 values.
use_cache – (bool, optional): Whether or not use the model cache to speed up decoding. Defaults to False.
cache (list, optional) – It is a list, and each element in the list is
incremental_cache
produced bypaddle.nn.TransformerEncoderLayer.gen_cache()
method. Seepaddle.nn.TransformerEncoder.gen_cache()
method for more details. It is only used for inference and should be None for training. Defaults to None.role_ids (Tensor, optional) –
- Indices of role ids indicated different roles.
It’s data type should be
int64
and has a shape of
[batch_size, sequence_length]. Defaults to None.
- Returns
If
use_cache
is False, it is a tensor representing the output ofUnifiedTransformerModel
, with shape [batch_size, sequence_length, hidden_size]. The data type is float32 or float64. Otherwise, it is a tuple, besides the output ofUnifiedTransformerModel
, the tuple also includes the new cache which is same as inputcache
butincremental_cache
in it has an incremental length. Seepaddle.nn.MultiHeadAttention.gen_cache()
method andpaddle.nn.MultiHeadAttention.forward()
method for more details.- Return type
Tensor|tuple
Example
from paddlenlp.transformers import UnifiedTransformerModel from paddlenlp.transformers import UnifiedTransformerTokenizer model = UnifiedTransformerModel.from_pretrained('plato-mini') tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini') history = '我爱祖国' inputs = tokenizer.dialogue_encode( history, return_tensors=True, is_split_into_words=False) outputs = model(**inputs)
-
class
UnifiedTransformerLMHeadModel
(unified_transformer)[source]¶ Bases:
paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerPretrainedModel
The UnifiedTransformer Model with a language modeling head on top for generation tasks.
- Parameters
unified_transformer (
UnifiedTransformerModel
) – An instance ofUnifiedTransformerModel
.
-
forward
(input_ids, token_type_ids, position_ids, attention_mask, masked_positions=None, use_cache=False, cache=None, role_ids=None)[source]¶ The UnifiedTransformerLMHeadModel forward method, overrides the special
__call__()
method.- Parameters
input_ids (Tensor) – See
UnifiedTransformerModel
.token_type_ids (Tensor) – See
UnifiedTransformerModel
.position_ids (Tensor) – See
UnifiedTransformerModel
.attention_mask (Tensor) – See
UnifiedTransformerModel
.use_cache – (bool, optional): See
UnifiedTransformerModel
.cache (list, optional) – See
UnifiedTransformerModel
.role_ids – (Tensor, optional): See
UnifiedTransformerModel
.
- Returns
If
use_cache
is False, it is a tensor representing the output ofUnifiedTransformerLMHeadModel
, with shape [batch_size, sequence_length, vocab_size]. The data type is float32 or float64. Otherwise, it is a tuple, besides the output ofUnifiedTransformerLMHeadModel
, the tuple also includes the new cache which is same as inputcache
butincremental_cache
in it has an incremental length. Seepaddle.nn.MultiHeadAttention.gen_cache()
method andpaddle.nn.MultiHeadAttention.forward()
method for more details.- Return type
Tensor|tuple
Example
from paddlenlp.transformers import UnifiedTransformerLMHeadModel from paddlenlp.transformers import UnifiedTransformerTokenizer model = UnifiedTransformerLMHeadModel.from_pretrained('plato-mini') tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini') history = '我爱祖国' inputs = tokenizer.dialogue_encode( history, return_tensors=True, is_split_into_words=False) logits = model(**inputs)
-
UnifiedTransformerForMaskedLM
¶ alias of
paddlenlp.transformers.unified_transformer.modeling.UnifiedTransformerLMHeadModel