ernie_model#

class FasterErnieModel(vocab_size, vocab_file, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, pad_token_id=0, do_lower_case=True, is_split_into_words=False, max_seq_len=512)[source]#

The bare ERNIE Model transformer outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

Parameters:
  • vocab_size (int) – Vocabulary size of inputs_ids in ErnieModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling ErnieModel.

  • hidden_size (int, optional) – Dimensionality of the embedding layer, encoder layers and pooler layer. Defaults to 768.

  • num_hidden_layers (int, optional) – Number of hidden layers in the Transformer encoder. Defaults to 12.

  • num_attention_heads (int, optional) – Number of attention heads for each attention layer in the Transformer encoder. Defaults to 12.

  • intermediate_size (int, optional) – Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size. Defaults to 3072.

  • hidden_act (str, optional) – The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu".

  • hidden_dropout_prob (float, optional) – The dropout probability for all fully connected layers in the embeddings and encoder. Defaults to 0.1.

  • attention_probs_dropout_prob (float, optional) – The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target. Defaults to 0.1.

  • max_position_embeddings (int, optional) – The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to 512.

  • type_vocab_size (int, optional) – The vocabulary size of the token_type_ids. Defaults to 2.

  • initializer_range (float, optional) –

    The standard deviation of the normal initializer for initializing all weight matrices. Defaults to 0.02.

    Note

    A normal_initializer initializes weight matrices as normal distributions. See ErniePretrainedModel._init_weights() for how weights are initialized in ErnieModel.

  • pad_token_id (int, optional) – The index of padding token in the token vocabulary. Defaults to 0.

forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class FasterErnieForSequenceClassification(ernie, num_classes=2, dropout=None)[source]#
forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class FasterErnieForTokenClassification(ernie, num_classes=2, dropout=None)[source]#
forward(text, text_pair=None)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments