modeling

class GPTModel(vocab_size, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=16, initializer_range=0.02, pad_token_id=0, eos_token_id=7, bos_token_id=0, eol_token_id=3, topo=None)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

The bare GPT Model transformer outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • vocab_size (int) -- Vocabulary size of inputs_ids in GPTModel. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling GPTModel.

  • hidden_size (int, optional) -- Dimensionality of the embedding layer and decoder layer. Defaults to 768.

  • num_hidden_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to 12.

  • num_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer decoder. Defaults to 12.

  • intermediate_size (int, optional) -- Dimensionality of the feed-forward (ff) layer in the decoder. Input tensors to ff layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size. Defaults to 3072.

  • hidden_act (str, optional) -- The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu".

  • hidden_dropout_prob (float, optional) -- The dropout probability for all fully connected layers in the embeddings and decoder. Defaults to 0.1.

  • attention_probs_dropout_prob (float, optional) -- The dropout probability used in MultiHeadAttention in all decoder layers to drop some attention target. Defaults to 0.1.

  • max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to 512.

  • type_vocab_size (int, optional) --

    The vocabulary size of the token_type_ids. Defaults to 16.

    注解

    Please NOT using type_vocab_size, for it will be obsolete in the future..

  • initializer_range (float, optional) --

    The standard deviation of the normal initializer. Default to 0.02.

    注解

    A normal_initializer initializes weight matrices as normal distributions. See GPTPretrainedModel._init_weights() for how weights are initialized in GPTModel.

  • pad_token_id (int, optional) -- The index of padding token in the token vocabulary. Defaults to 0.

forward(input_ids, position_ids=None, attention_mask=None, use_cache=False, cache=None)[源代码]

The GPTModel forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • position_ids (Tensor, optional) -- Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, max_position_embeddings - 1]. Shape as (batch_size, num_tokens) and dtype as int64. Defaults to None.

  • attention_mask (Tensor, optional) -- Mask used in self attention to avoid performing attention to some unwanted positions, usually the subsequent positions. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Its data type should be float32. The masked tokens have -1e-9 values, and the unmasked tokens have 0 values. Defaults to None, which means nothing needed to be prevented attention to.

  • use_cache (bool, optional) -- Whether or not to use cache. Defaults to False. If set to True, key value states will be returned and can be used to speed up decoding.

  • cache (list, optional) -- It is a list, and each element in the list is a tuple (incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default to None.

返回

Returns tensor encoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import GPTModel, GPTTokenizer

tokenizer = GPTTokenizer.from_pretrained('gpt2-medium-en')
model = GPTModel.from_pretrained('gpt2-medium-en')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!", return_token_type_ids=False)
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs)
class GPTPretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained GPT models. It provides GPT related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

init_weights(layer)[源代码]

Initialization hook

base_model_class

alias of paddlenlp.transformers.gpt.modeling.GPTModel

class GPTForPretraining(gpt)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

GPT Model with pretraining tasks on top.

参数

gpt (GPTModel) -- An instance of GPTModel.

forward(input_ids, position_ids=None, attention_mask=None, masked_positions=None, use_cache=False, cache=None)[源代码]
参数
  • input_ids (Tensor) -- See GPTModel.

  • position_ids (Tensor, optional) -- See GPTModel.

  • attention_mask (Tensor, optional) -- See GPTModel.

  • use_cache (bool, optional) -- See GPTModel.

  • cache (Tensor, optional) -- See GPTModel.

返回

Returns tensor logits or tuple (logits, cached_kvs). If use_cache is True, tuple (logits, cached_kvs) will be returned. Otherwise, tensor logits will be returned. logits is the output of the gpt model. cache_kvs is the cache output of gpt model if use_cache is True.

返回类型

Tensor or tuple

示例

import paddle
from paddlenlp.transformers import GPTForPretraining, GPTTokenizer

tokenizer = GPTTokenizer.from_pretrained('gpt2-medium-en')
model = GPTForPretraining.from_pretrained('gpt2-medium-en')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!", return_token_type_ids=False)
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
output = model(**inputs,use_cache=True)

logits = output[0]
cached_kvs = output[1]
class GPTPretrainingCriterion(topo=None)[源代码]

基类:paddle.fluid.dygraph.layers.Layer

Criterion for GPT. It calculates the final loss.

forward(prediction_scores, masked_lm_labels, loss_mask)[源代码]
参数
  • prediction_scores (Tensor) -- The logits of masked token prediction. Its data type should be float32 and its shape is [batch_size, sequence_length, vocab_size].

  • masked_lm_labels (Tensor) -- The labels of the masked language modeling, the dimensionality of masked_lm_labels is equal to prediction_scores. Its data type should be int64 and its shape is [batch_size, sequence_length, 1].

  • loss_mask (Tensor) -- Mask used for calculating the loss of the masked language modeling to avoid calculating some unwanted tokens. Its data type should be float32 and its shape is [batch_size, sequence_length, 1].

返回

The pretraining loss. Its data type should be float32 and its shape is [1].

返回类型

Tensor

class GPTForGreedyGeneration(gpt, max_predict_len, eol_token_id=3)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

The generate model for GPT-2. It use the greedy strategy and generate the output sequence with highest probability.

参数
  • gpt (GPTModel) -- An instance of paddlenlp.transformers.GPTModel.

  • max_predict_len (int) -- The max length of the prediction.

model(input_ids, position_ids=None, attention_mask=None, masked_positions=None, use_cache=False, cache=None)[源代码]
参数
  • input_ids (Tensor) -- See GPTModel.

  • position_ids (Tensor, optional) -- See GPTModel.

  • attention_mask (Tensor, optional) -- See GPTModel.

  • use_cache (bool, optional) -- See GPTModel.

  • cache (Tensor, optional) -- See GPTModel.

返回

Returns tensor logits or tuple (logits, cached_kvs). If use_cache is True, tuple (logits, cached_kvs) will be returned. Otherwise, tensor logits will be returned. logits is the output of the gpt model. cache_kvs is the cache output of gpt model if use_cache is True.

返回类型

Tensor or tuple

forward(input_ids)[源代码]
参数

input_ids (Tensor) -- See GPTModel.

返回

Returns tensor src_ids, which means the indices of output sequence tokens in the vocabulary. They are numerical representations of tokens that build the output sequence.

返回类型

Tensor

class GPTLMHeadModel(gpt)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

The GPT Model with a language modeling head on top.

参数

gpt (GPTModel) -- An instance of GPTModel.

forward(input_ids, position_ids=None, attention_mask=None, use_cache=False, cache=None)[源代码]
参数
  • input_ids (Tensor) -- See GPTModel.

  • position_ids (Tensor, optional) -- See GPTModel.

  • attention_mask (Tensor, optional) -- See GPTModel.

  • use_cache (bool, optional) -- See GPTModel.

  • cache (Tensor, optional) -- See GPTModel.

返回

Returns tensor logits or tuple (logits, cached_kvs). If use_cache is True, tuple (logits, cached_kvs) will be returned. Otherwise, tensor logits will be returned. logits is the output of the gpt model. cache_kvs is the cache output of gpt model if use_cache is True.

返回类型

Tensor or tuple

class GPTForTokenClassification(gpt, num_classes=2, dropout=None)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

GPT Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

参数
  • gpt (GPTModel) -- An instance of GPTModel.

  • num_classes (int, optional) -- The number of classes. Defaults to 2.

  • dropout (float, optional) -- The dropout probability for output of GPT. If None, use the same value as hidden_dropout_prob of GPTModel instance gpt. Defaults to None.

forward(input_ids, position_ids=None, attention_mask=None)[源代码]

The GPTForTokenClassification forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- See GPTModel.

  • position_ids (Tensor, optional) -- See GPTModel.

  • attention_mask (list, optional) -- See GPTModel.

返回

Returns tensor logits, a tensor of the input token classification logits. Shape as [batch_size, sequence_length, num_classes] and dtype as float32.

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import GPTForTokenClassification, GPTTokenizer

tokenizer = GPTTokenizer.from_pretrained('gpt2-medium-en')
model = GPTForTokenClassification.from_pretrained('gpt2-medium-en')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!", return_token_type_ids=False)
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
logits = model(**inputs)
class GPTForSequenceClassification(gpt, num_classes=2)[源代码]

基类:paddlenlp.transformers.gpt.modeling.GPTPretrainedModel

GPT Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

参数
  • gpt (GPTModel) -- An instance of GPTModel.

  • num_classes (int, optional) -- The number of classes. Defaults to 2.

forward(input_ids, position_ids=None, attention_mask=None)[源代码]

The GPTForSequenceClassification forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- See GPTModel.

  • position_ids (Tensor, optional) -- See GPTModel.

  • attention_mask (list, optional) -- See GPTModel.

返回

Returns tensor logits, a tensor of the input text classification logits. Shape as [batch_size, num_classes] and dtype as float32.

返回类型

Tensor

示例

import paddle
from paddlenlp.transformers import GPTForSequenceClassification, GPTTokenizer

tokenizer = GPTTokenizer.from_pretrained('gpt2-medium-en')
model = GPTForSequenceClassification.from_pretrained('gpt2-medium-en')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!", return_token_type_ids=False)
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
logits = model(**inputs)