ernie_model#
- class FasterErnieModel(vocab_size, vocab_file, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, pad_token_id=0, do_lower_case=True, is_split_into_words=False, max_seq_len=512)[源代码]#
The bare ERNIE Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数:
vocab_size (int) -- Vocabulary size of
inputs_idsinErnieModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_idspassed when callingErnieModel.hidden_size (int, optional) -- Dimensionality of the embedding layer, encoder layers and pooler layer. Defaults to
768.num_hidden_layers (int, optional) -- Number of hidden layers in the Transformer encoder. Defaults to
12.num_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer encoder. Defaults to
12.intermediate_size (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from
hidden_sizetointermediate_size, and then projected back tohidden_size. Typicallyintermediate_sizeis larger thanhidden_size. Defaults to3072.hidden_act (str, optional) -- The non-linear activation function in the feed-forward layer.
"gelu","relu"and any other paddle supported activation functions are supported. Defaults to"gelu".hidden_dropout_prob (float, optional) -- The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to
0.1.attention_probs_dropout_prob (float, optional) -- The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to
0.1.max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to
512.type_vocab_size (int, optional) -- The vocabulary size of the
token_type_ids. Defaults to2.initializer_range (float, optional) --
The standard deviation of the normal initializer for initializing all weight matrices. Defaults to
0.02.备注
A normal_initializer initializes weight matrices as normal distributions. See
ErniePretrainedModel._init_weights()for how weights are initialized inErnieModel.pad_token_id (int, optional) -- The index of padding token in the token vocabulary. Defaults to
0.