ernie_model#

class FasterErnieModel(vocab_size, vocab_file, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, pad_token_id=0, do_lower_case=True, is_split_into_words=False, max_seq_len=512)[source]#

The bare ERNIE Model transformer outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

Parameters:

vocab_size (int) – Vocabulary size of inputs_ids in ErnieModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling ErnieModel.
hidden_size (int, optional) – Dimensionality of the embedding layer, encoder layers and pooler layer. Defaults to 768.
num_hidden_layers (int, optional) – Number of hidden layers in the Transformer encoder. Defaults to 12.
num_attention_heads (int, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to 12.
intermediate_size (int, optional) – Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size. Defaults to 3072.
hidden_act (str, optional) – The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu".
hidden_dropout_prob (float, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to 0.1.
attention_probs_dropout_prob (float, optional) – The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to 0.1.
max_position_embeddings (int, optional) – The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to 512.
type_vocab_size (int, optional) – The vocabulary size of the token_type_ids. Defaults to 2.
initializer_range (float, optional) –
The standard deviation of the normal initializer for initializing all weight matrices. Defaults to 0.02.

Note

A normal_initializer initializes weight matrices as normal distributions. See ErniePretrainedModel._init_weights() for how weights are initialized in ErnieModel.
pad_token_id (int, optional) – The index of padding token in the token vocabulary. Defaults to 0.

forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

class FasterErnieForSequenceClassification(ernie, num_classes=2, dropout=None)[source]#

forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

class FasterErnieForTokenClassification(ernie, num_classes=2, dropout=None)[source]#

forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

ernie_model

Contents

ernie_model#