modeling

modeling#

Modeling classes for XLNet model.

class XLNetPretrainedModel(*args, **kwargs)[source]#

Bases: PretrainedModel

An abstract class for pretrained XLNet models. It provides XLNet related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

config_class#: alias of XLNetConfig

base_model_class#: alias of XLNetModel

class XLNetModel(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

The bare XLNet Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

Parameters:

config (XLNetConfig) –

An instance of XLNetConfig.

Note

A normal_initializer initializes weight matrices as normal distributions. See XLNetPretrainedModel._init_weights() for how weights are initialized in XLNetModel.

get_input_embeddings()[source]#

get input embedding of model

Returns:: embedding of model
Return type:: nn.Embedding

set_input_embeddings(new_embeddings)[source]#

set new input embedding for model

Parameters:: value (Embedding) – the new embedding of model
Raises:: NotImplementedError – Model has not implement set_input_embeddings method

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]#

The XLNetModel forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It’s data type should be int64 and has a shape of [batch_size, sequence_length].
token_type_ids (Tensor, optional) –
Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
It’s data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.
attention_mask (Tensor, optional) –
Mask to indicate whether to perform attention on each input token or not. The values should be either 0 or 1. The attention scores will be set to -infinity for any positions in the mask that are 0, and will be unchanged for positions that are 1.
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
It’s data type should be float32 and has a shape of [batch_size, sequence_length]. Defaults to None.
mems (List[Tensor], optional) –
A list of length n_layers with each Tensor being a pre-computed hidden-state for each layer. Each Tensor has a dtype float32 and a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, and we don’t use mems.

Note

use_mems has to be set to True in order to make use of mems.
perm_mask (Tensor, optional) –
Mask to indicate the permutation pattern of the input sequence with values being either 0 or 1.
- if perm_mask[k, i, j] = 0, i attend to j in batch k;
- if perm_mask[k, i, j] = 1, i does not attend to j in batch k.
Only used during pretraining (to define factorization order) or for sequential decoding (generation). It’s data type should be float32 and has a shape of [batch_size, sequence_length, sequence_length]. Defaults to None, then each token attends to all the other tokens (full bidirectional attention).
target_mapping (Tensor, optional) – Mask to indicate the output tokens to use with values being either 0 or 1. If target_mapping[k, i, j] = 1, the i-th predict in batch k is on the j-th token. It’s data type should be float32 and has a shape of [batch_size, num_predict, sequence_length]. Only used during pretraining for partial prediction or for sequential decoding (generation). Defaults to None.
input_mask (Tensor, optional) –
Mask to avoid performing attention on padding token with values being either 0 or 1. It’s data type should be float32 and it has a shape of [batch_size, sequence_length]. This mask is negative of attention_mask:
- 1 for tokens that are masked,
- 0 for tokens that are not masked.
You should use only one of input_mask and attention_mask. Defaults to None.
head_mask (Tensor, optional) –
Mask to nullify selected heads of the self-attention layers with values being either 0 or 1.
- 1 indicates the head is not masked,
- 0 indicates the head is masked.
It’s data type should be float32 and has a shape of [num_heads] or [num_layers, num_heads]. Defaults to None, which means we keep all heads.
inputs_embeds (Tensor, optional) – An embedded representation tensor which is an alternative of input_ids. You should specify only either one of them to avoid contradiction. It’s data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, which means we only specify input_ids.
use_mems_train (bool, optional) – Whether or not to use recurrent memory mechanism during training. Defaults to False and we don’t use recurrent memory mechanism in training mode.
use_mems_eval (bool, optional) – Whether or not to use recurrent memory mechanism during evaluation. Defaults to False and we don’t use recurrent memory mechanism in evaluation mode.
return_dict (bool, optional) – Whether or not to return additional information other than the output tensor. If True, then returns information about output, new_mems, hidden_states and attentions which will also be formatted as a dict. Else only returns the output tensor. Defaults to False.

Returns:

Returns tensor output or a dict with key-value pairs: {“last_hidden_state”: output, “mems”: mems, “hidden_states”: hidden_states, “attentions”: attentions}.

With the corresponding fields:

output (Tensor):
Output of the final layer of the model. It’s a Tensor of dtype float32 and has a shape of [batch_size, num_predict, hidden_size].

Note

num_predict corresponds to target_mapping.shape[1]. If target_mapping is None, then num_predict equals to sequence_length.
mems (List[Tensor]):
A list of pre-computed hidden-states. The length of the list is n_layers. Each element in the list is a Tensor with dtype float32 and has a shape of [batch_size, sequence_length, hidden_size].
hidden_states (List[Tensor], optional):
A list of Tensor containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of float32 and has a shape of [batch_size, sequence_length, hidden_size]. Being returned when output_hidden_states is set to True.
attentions (List[Tensor], optional):
A list of Tensor containing attentions weights of each hidden layer. Each Tensor (one for each layer) has a data type of float32 and has a shape of [batch_size, num_heads, sequence_length, sequence_length]. Being returned when output_attentions is set to True.

Return type:

Tensor or dict

Example

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

last_hidden_states = outputs[0]

class XLNetForSequenceClassification(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

XLNet Model with a linear layer on top of the output layer, designed for sequence classification/regression tasks like GLUE tasks.

Parameters:: config (XLNetConfig) – An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False, problem_type: str = 'single_label_classification')[source]#

The XLNetForSequenceClassification forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – See XLNetModel.
token_type_ids (Tensor, optional) – See XLNetModel.
attention_mask (Tensor, optional) – See XLNetModel.
mems (Tensor, optional) – See XLNetModel.
perm_mask (Tensor, optional) – See XLNetModel.
target_mapping (Tensor, optional) – See XLNetModel.
input_mask (Tensor, optional) – See XLNetModel.
head_mask (Tensor, optional) – See XLNetModel.
inputs_embeds (Tensor, optional) – See XLNetModel.
use_mems_train (bool, optional) – See XLNetModel.
use_mems_eval (bool, optional) – See XLNetModel.
return_dict (bool, optional) – See XLNetModel.

Returns:

Returns tensor logits or a dict with key-value pairs: {“logits”: logits, “mems”: mems, “hidden_states”: hidden_states, “attentions”: attentions}.

With the corresponding fields:

logits (Tensor):
Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, num_classes].
mems (List[Tensor]):
See XLNetModel.
hidden_states (List[Tensor], optional):
See XLNetModel.
attentions (List[Tensor], optional):
See XLNetModel.

Return type:

Tensor or dict

Example

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]

class XLNetForTokenClassification(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

XLNet Model with a linear layer on top of the hidden-states output layer, designed for token classification tasks like NER tasks.

Parameters:: config (XLNetConfig) – An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]#

The XLNetForTokenClassification forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – See XLNetModel.
token_type_ids (Tensor, optional) – See XLNetModel.
attention_mask (Tensor, optional) – See XLNetModel.
mems (Tensor, optional) – See XLNetModel.
perm_mask (Tensor, optional) – See XLNetModel.
target_mapping (Tensor, optional) – See XLNetModel.
input_mask (Tensor, optional) – See XLNetModel.
head_mask (Tensor, optional) – See XLNetModel.
inputs_embeds (Tensor, optional) – See XLNetModel.
use_mems_train (bool, optional) – See XLNetModel.
use_mems_eval (bool, optional) – See XLNetModel.
return_dict (bool, optional) – See XLNetModel.

Returns:

Returns tensor logits or a dict with key-value pairs:: {“logits”: logits, “mems”: mems,

”hidden_states”: hidden_states, “attentions”: attentions}.

With the corresponding fields:

logits (Tensor):
Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].
mems (List[Tensor]):
See XLNetModel.
hidden_states (List[Tensor], optional):
See XLNetModel.
attentions (List[Tensor], optional):
See XLNetModel.

Return type:

Tensor or dict

Example

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]

class XLNetLMHeadModel(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

XLNet Model with a language modeling head on top (linear layer with weights tied to the input embeddings).

Parameters:: config (XLNetConfig) – An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]#

The XLNetLMHeadModel forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – See XLNetModel.
token_type_ids (Tensor, optional) – See XLNetModel.
attention_mask (Tensor, optional) – See XLNetModel.
mems (Tensor, optional) – See XLNetModel.
perm_mask (Tensor, optional) – See XLNetModel.
target_mapping (Tensor, optional) – See XLNetModel.
input_mask (Tensor, optional) – See XLNetModel.
head_mask (Tensor, optional) – See XLNetModel.
inputs_embeds (Tensor, optional) – See XLNetModel.
use_mems_train (bool, optional) – See XLNetModel.
use_mems_eval (bool, optional) – See XLNetModel.
return_dict (bool, optional) – See XLNetModel.

Returns:

Returns tensor logits or a dict with key-value pairs:: {“logits”: logits, “mems”: mems,

”hidden_states”: hidden_states, “attentions”: attentions}.

With the corresponding fields:

logits (Tensor):
Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].
mems (List[Tensor]):
See XLNetModel.
hidden_states (List[Tensor], optional):
See XLNetModel.
attentions (List[Tensor], optional):
See XLNetModel.

Return type:

Tensor or dict

Example

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetLMHeadModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetLMHeadModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
logits = outputs

class XLNetForMultipleChoice(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

XLNet Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RACE/SWAG tasks.

Parameters:: config (XLNetConfig) – An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict=False)[source]#

The XLNetForMultipleChoice forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – See XLNetModel.
token_type_ids (Tensor, optional) – See XLNetModel.
attention_mask (Tensor, optional) – See XLNetModel.
mems (Tensor, optional) – See XLNetModel.
perm_mask (Tensor, optional) – See XLNetModel.
target_mapping (Tensor, optional) – See XLNetModel.
input_mask (Tensor, optional) – See XLNetModel.
head_mask (Tensor, optional) – See XLNetModel.
inputs_embeds (Tensor, optional) – See XLNetModel.
use_mems_train (bool, optional) – See XLNetModel.
use_mems_eval (bool, optional) – See XLNetModel.
return_dict (bool, optional) – See XLNetModel.

Returns:

Returns tensor logtis or a dict with key-value pairs:: {“logits”: logits, “mems”: mems,

”hidden_states”: hidden_states, “attentions”: attentions}

With the corresponding fields: - logits (Tensor):

Classification scores before SoftMax (also called logits). It’s data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

mems (List[Tensor]):
See XLNetModel.
hidden_states (List[Tensor], optional):
See XLNetModel.
attentions (List[Tensor], optional):
See XLNetModel.

Return type:

tensor or dict

Example

import paddle
from paddlenlp.transformers import XLNetForMultipleChoice, XLNetTokenizer
from paddlenlp.data import Pad, Dict
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForMultipleChoice.from_pretrained('xlnet-base-cased')
data = [
    {
        "question": "how do you turn on an ipad screen?",
        "answer1": "press the volume button.",
        "answer2": "press the lock button.",
        "label": 1,
    },
    {
        "question": "how do you indent something?",
        "answer1": "leave a space before starting the writing",
        "answer2": "press the spacebar",
        "label": 0,
    },
]
text = []
text_pair = []
for d in data:
    text.append(d["question"])
    text_pair.append(d["answer1"])
    text.append(d["question"])
    text_pair.append(d["answer2"])
inputs = tokenizer(text, text_pair)
batchify_fn = lambda samples, fn=Dict(
    {
        "input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_ids
        "token_type_ids": Pad(
            axis=0, pad_val=tokenizer.pad_token_type_id
        ),  # token_type_ids
    }
): fn(samples)
inputs = batchify_fn(inputs)
reshaped_logits = model(
    input_ids=paddle.to_tensor(inputs[0], dtype="int64"),
    token_type_ids=paddle.to_tensor(inputs[1], dtype="int64"),
)
print(reshaped_logits.shape)
# [2, 2]

class XLNetForQuestionAnswering(config: XLNetConfig)[source]#

Bases: XLNetPretrainedModel

XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

Parameters:: config (XLNetConfig) – An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, start_positions=None, end_positions=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[source]#

The XLNetForQuestionAnswering forward method, overrides the __call__() special method.

Parameters:

input_ids (Tensor) – See XLNetModel.
token_type_ids (Tensor, optional) – See XLNetModel.
attention_mask (Tensor, optional) – See XLNetModel.
mems (Tensor, optional) – See XLNetModel.
perm_mask (Tensor, optional) – See XLNetModel.
target_mapping (Tensor, optional) – See XLNetModel.
input_mask (Tensor, optional) – See XLNetModel.
head_mask (Tensor, optional) – See XLNetModel.
inputs_embeds (Tensor, optional) – See XLNetModel.
use_mems_train (bool, optional) – See XLNetModel.
use_mems_eval (bool, optional) – See XLNetModel.
return_dict (bool, optional) – See XLNetModel.

Returns:

Returns tensor (start_logits, end_logits) or a dict with key-value pairs:: {“start_logits”: start_logits, “end_logits”: end_logits, “mems”: mems,

”hidden_states”: hidden_states, “attentions”: attentions}

With the corresponding fields: - start_logits (Tensor):

A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

end_logits (Tensor):
A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
mems (List[Tensor]):
See XLNetModel.
hidden_states (List[Tensor], optional):
See XLNetModel.
attentions (List[Tensor], optional):
See XLNetModel.

Return type:

tuple or dict

Example

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForQuestionAnswering
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForQuestionAnswering.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
start_logits = outputs[0]
end_logits = outputs[1]

XLNetForCausalLM#: alias of XLNetLMHeadModel

modeling

Contents

modeling#