ernie_model#
- class FasterErnieModel(vocab_size, vocab_file, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, pad_token_id=0, do_lower_case=True, is_split_into_words=False, max_seq_len=512)[source]#
The bare ERNIE Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters:
vocab_size (int) – Vocabulary size of
inputs_idsinErnieModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_idspassed when callingErnieModel.hidden_size (int, optional) – Dimensionality of the embedding layer, encoder layers and pooler layer. Defaults to
768.num_hidden_layers (int, optional) – Number of hidden layers in the Transformer encoder. Defaults to
12.num_attention_heads (int, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to
12.intermediate_size (int, optional) – Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from
hidden_sizetointermediate_size, and then projected back tohidden_size. Typicallyintermediate_sizeis larger thanhidden_size. Defaults to3072.hidden_act (str, optional) – The non-linear activation function in the feed-forward layer.
"gelu","relu"and any other paddle supported activation functions are supported. Defaults to"gelu".hidden_dropout_prob (float, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to
0.1.attention_probs_dropout_prob (float, optional) – The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to
0.1.max_position_embeddings (int, optional) – The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to
512.type_vocab_size (int, optional) – The vocabulary size of the
token_type_ids. Defaults to2.initializer_range (float, optional) –
The standard deviation of the normal initializer for initializing all weight matrices. Defaults to
0.02.Note
A normal_initializer initializes weight matrices as normal distributions. See
ErniePretrainedModel._init_weights()for how weights are initialized inErnieModel.pad_token_id (int, optional) – The index of padding token in the token vocabulary. Defaults to
0.