ernie_model#
- class FasterErnieModel(vocab_size, vocab_file, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, pad_token_id=0, do_lower_case=True, is_split_into_words=False, max_seq_len=512)[source]#
The bare ERNIE Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters:
vocab_size (int) – Vocabulary size of
inputs_ids
inErnieModel
. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingErnieModel
.hidden_size (int, optional) – Dimensionality of the embedding layer, encoder layers and pooler layer. Defaults to
768
.num_hidden_layers (int, optional) – Number of hidden layers in the Transformer encoder. Defaults to
12
.num_attention_heads (int, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to
12
.intermediate_size (int, optional) – Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from
hidden_size
tointermediate_size
, and then projected back tohidden_size
. Typicallyintermediate_size
is larger thanhidden_size
. Defaults to3072
.hidden_act (str, optional) – The non-linear activation function in the feed-forward layer.
"gelu"
,"relu"
and any other paddle supported activation functions are supported. Defaults to"gelu"
.hidden_dropout_prob (float, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to
0.1
.attention_probs_dropout_prob (float, optional) – The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to
0.1
.max_position_embeddings (int, optional) – The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to
512
.type_vocab_size (int, optional) – The vocabulary size of the
token_type_ids
. Defaults to2
.initializer_range (float, optional) –
The standard deviation of the normal initializer for initializing all weight matrices. Defaults to
0.02
.Note
A normal_initializer initializes weight matrices as normal distributions. See
ErniePretrainedModel._init_weights()
for how weights are initialized inErnieModel
.pad_token_id (int, optional) – The index of padding token in the token vocabulary. Defaults to
0
.