modeling

Modeling classes for XLNet model.

class XLNetPretrainedModel(name_scope=None, dtype='float32')[源代码]

基类:paddlenlp.transformers.model_utils.PretrainedModel

An abstract class for pretrained XLNet models. It provides XLNet related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

base_model_class

alias of paddlenlp.transformers.xlnet.modeling.XLNetModel

class XLNetModel(vocab_size, mem_len=None, reuse_len=None, d_model=768, same_length=False, attn_type='bi', bi_data=False, clamp_len=- 1, n_layer=12, dropout=0.1, classifier_dropout=0.1, n_head=12, d_head=64, layer_norm_eps=1e-12, d_inner=3072, ff_activation='gelu', initializer_range=0.02)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

The bare XLNet Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数
  • vocab_size (int) -- Vocabulary size of inputs_ids in XLNetModel. Also is the vocab size of token embedding matrix.

  • mem_len (int or None, optional) -- The number of tokens to cache. If not 0 or None, the last mem_len hidden states in each layer will be cached into memory. Defaults to None.

  • reuse_len (int or None, optional) --

    The number of tokens in the current batch to be cached. If positive, then at most reuse_len tokens can be cached in the current batch. Otherwise, there is no limit to the number of tokens. Defaults to None.

    注解

    The difference between mem_len and reuse_len is that mem_len defines the total number of tokens to cache while reuse_len defines the number of tokens in the current batch to be cached.

  • d_model (int, optional) -- Dimensionality of the embedding layers, encoder layers and pooler layer. Defaults to 768.

  • same_length (bool, optional) -- Whether or not to use the same attention length for each token. Defaults to False.

  • attn_type (str, optional) -- The attention type used in the attention layer. Set "bi" for XLNet, "uni" for Transformer-XL. Defaults to "bi".

  • bi_data (bool, optional) -- Whether or not to use bidirectional input pipeline. Set to True during pretraining and False during fine-tuning. Defaults to False.

  • clamp_len (int, optional) -- Maximum relative distance supported. All relative distances larger than clamp_len will be clamped. Setting this attribute to -1 means no clamping. Defaults to -1.

  • n_layer (int, optional) -- The number of hidden layers in the encoder. Defaults to 12.

  • dropout (float, optional) -- The dropout ratio for all fully connected layers in the embeddings and encoder. Defaults to 0.1.

  • classifier_dropout (float, optional) -- The dropout ratio for all fully connected layers in the pooler (classification head). Defaults to 0.1.

  • n_head (int, optional) -- Number of attention heads in each attention layer. Defaults to 12.

  • d_head (int, optional) --

    Dimensionality of each attention head. Defaults to 64.

    注解

    d_head should be equal to d_model divided by n_head.

  • layer_norm_eps (float, optional) -- The epsilon parameter used in paddle.nn.LayerNorm for initializing layer normalization layers. Defaults to 1e-12.

  • d_inner (int, optional) -- Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to ff layers are firstly projected from d_model to d_inner, and then projected back to d_model. Typically d_inner is larger than d_model. Defaults to 3072.

  • ff_activation (str, optional) -- The non-linear activation function in the feed-forward layers in the encoder. Choose from the following supported activation functions: ["relu", "gelu", "tanh", "sigmoid", "mish", "swish"]. Defaults to "gelu".

  • initializer_range (float, optional) --

    The standard deviation of the normal initializer. Defaults to 0.02.

    注解

    A normal_initializer initializes weight matrices as normal distributions. See XLNetPretrainedModel._init_weights() for how weights are initialized in XLNetModel.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetModel forward method, overrides the __call__() special method.

参数
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • token_type_ids (Tensor, optional) --

    Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    It's data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.

  • attention_mask (Tensor, optional) --

    Mask to indicate whether to perform attention on each input token or not. The values should be either 0 or 1. The attention scores will be set to -infinity for any positions in the mask that are 0, and will be unchanged for positions that are 1.

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    It's data type should be float32 and has a shape of [batch_size, sequence_length]. Defaults to None.

  • mems (List[Tensor], optional) --

    A list of length n_layers with each Tensor being a pre-computed hidden-state for each layer. Each Tensor has a dtype float32 and a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, and we don't use mems.

    注解

    use_mems has to be set to True in order to make use of mems.

  • perm_mask (Tensor, optional) --

    Mask to indicate the permutation pattern of the input sequence with values being either 0 or 1.

    • if perm_mask[k, i, j] = 0, i attend to j in batch k;

    • if perm_mask[k, i, j] = 1, i does not attend to j in batch k.

    Only used during pretraining (to define factorization order) or for sequential decoding (generation). It's data type should be float32 and has a shape of [batch_size, sequence_length, sequence_length]. Defaults to None, then each token attends to all the other tokens (full bidirectional attention).

  • target_mapping (Tensor, optional) -- Mask to indicate the output tokens to use with values being either 0 or 1. If target_mapping[k, i, j] = 1, the i-th predict in batch k is on the j-th token. It's data type should be float32 and has a shape of [batch_size, num_predict, sequence_length]. Only used during pretraining for partial prediction or for sequential decoding (generation). Defaults to None.

  • input_mask (Tensor, optional) --

    Mask to avoid performing attention on padding token with values being either 0 or 1. It's data type should be float32 and it has a shape of [batch_size, sequence_length]. This mask is negative of attention_mask:

    • 1 for tokens that are masked,

    • 0 for tokens that are not masked.

    You should use only one of input_mask and attention_mask. Defaults to None.

  • head_mask (Tensor, optional) --

    Mask to nullify selected heads of the self-attention layers with values being either 0 or 1.

    • 1 indicates the head is not masked,

    • 0 indicates the head is masked.

    It's data type should be float32 and has a shape of [num_heads] or [num_layers, num_heads]. Defaults to None, which means we keep all heads.

  • inputs_embeds (Tensor, optional) -- An embedded representation tensor which is an alternative of input_ids. You should specify only either one of them to avoid contradiction. It's data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, which means we only specify input_ids.

  • use_mems_train (bool, optional) -- Whether or not to use recurrent memory mechanism during training. Defaults to False and we don't use recurrent memory mechanism in training mode.

  • use_mems_eval (bool, optional) -- Whether or not to use recurrent memory mechanism during evaluation. Defaults to False and we don't use recurrent memory mechanism in evaluation mode.

  • return_dict (bool, optional) -- Whether or not to return additional information other than the output tensor. If True, then returns information about output, new_mems, hidden_states and attentions which will also be formatted as a dict. Else only returns the output tensor. Defaults to False.

返回

Returns tensor output or a dict with key-value pairs: {"last_hidden_state": output, "mems": mems, "hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • output (Tensor):

    Output of the final layer of the model. It's a Tensor of dtype float32 and has a shape of [batch_size, num_predict, hidden_size].

    注解

    num_predict corresponds to target_mapping.shape[1]. If target_mapping is None, then num_predict equals to sequence_length.

  • mems (List[Tensor]):

    A list of pre-computed hidden-states. The length of the list is n_layers. Each element in the list is a Tensor with dtype float32 and has a shape of [batch_size, sequence_length, hidden_size].

  • hidden_states (List[Tensor], optional):

    A list of Tensor containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of float32 and has a shape of [batch_size, sequence_length, hidden_size]. Being returned when output_hidden_states is set to True.

  • attentions (List[Tensor], optional):

    A list of Tensor containing attentions weights of each hidden layer. Each Tensor (one for each layer) has a data type of float32 and has a shape of [batch_size, num_heads, sequence_length, sequence_length]. Being returned when output_attentions is set to True.

返回类型

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

last_hidden_states = outputs[0]
class XLNetForSequenceClassification(xlnet, num_classes=2)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

XLNet Model with a linear layer on top of the output layer, designed for sequence classification/regression tasks like GLUE tasks.

参数
  • xlnet (XLNetModel) -- An instance of XLNetModel.

  • num_classes (int, optional) -- The number of classes. Defaults to 2.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetForSequenceClassification forward method, overrides the __call__() special method.

参数
返回

Returns tensor logits or a dict with key-value pairs: {"logits": logits, "mems": mems, "hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]
class XLNetForTokenClassification(xlnet, num_classes=2)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

XLNet Model with a linear layer on top of the hidden-states output layer, designed for token classification tasks like NER tasks.

参数
  • xlnet (XLNetModel) -- An instance of XLNetModel.

  • num_classes (int, optional) -- The number of classes. Defaults to 2.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetForTokenClassification forward method, overrides the __call__() special method.

参数
返回

Returns tensor logits or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]
class XLNetLMHeadModel(xlnet)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

XLNet Model with a language modeling head on top (linear layer with weights tied to the input embeddings).

参数

xlnet (XLNetModel) -- An instance of XLNetModel.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetLMHeadModel forward method, overrides the __call__() special method.

参数
返回

Returns tensor logits or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetLMHeadModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetLMHeadModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
logits = outputs
class XLNetForMultipleChoice(xlnet)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

XLNet Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RACE/SWAG tasks.

参数

xlnet (XLNetModel) -- An instance of XLNetModel.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetForMultipleChoice forward method, overrides the __call__() special method.

参数
返回

Returns tensor logtis or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}

With the corresponding fields: - logits (Tensor):

Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

返回类型

tensor or dict

示例

import paddle
from paddlenlp.transformers import XLNetForMultipleChoice, XLNetTokenizer
from paddlenlp.data import Pad, Dict
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForMultipleChoice.from_pretrained('xlnet-base-cased')
data = [
    {
        "question": "how do you turn on an ipad screen?",
        "answer1": "press the volume button.",
        "answer2": "press the lock button.",
        "label": 1,
    },
    {
        "question": "how do you indent something?",
        "answer1": "leave a space before starting the writing",
        "answer2": "press the spacebar",
        "label": 0,
    },
]
text = []
text_pair = []
for d in data:
    text.append(d["question"])
    text_pair.append(d["answer1"])
    text.append(d["question"])
    text_pair.append(d["answer2"])
inputs = tokenizer(text, text_pair)
batchify_fn = lambda samples, fn=Dict(
    {
        "input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_ids
        "token_type_ids": Pad(
            axis=0, pad_val=tokenizer.pad_token_type_id
        ),  # token_type_ids
    }
): fn(samples)
inputs = batchify_fn(inputs)
reshaped_logits = model(
    input_ids=paddle.to_tensor(inputs[0], dtype="int64"),
    token_type_ids=paddle.to_tensor(inputs[1], dtype="int64"),
)
print(reshaped_logits.shape)
# [2, 2]
class XLNetForQuestionAnswering(xlnet)[源代码]

基类:paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel

XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

参数

xlnet (XLNetModel) -- An instance of XLNetModel.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]

The XLNetForQuestionAnswering forward method, overrides the __call__() special method.

参数
返回

Returns tensor (start_logits, end_logits) or a dict with key-value pairs:

{"start_logits": start_logits, "end_logits": end_logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}

With the corresponding fields: - start_logits (Tensor):

A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • end_logits (Tensor):

    A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型

tuple or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForQuestionAnswering
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForQuestionAnswering.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
start_logits = outputs[0]
end_logits = outputs[1]