modeling#

Modeling classes for XLNet model.

class XLNetPretrainedModel(*args, **kwargs)[源代码]#

基类:PretrainedModel

An abstract class for pretrained XLNet models. It provides XLNet related model_config_file, resource_files_names, pretrained_resource_files_map, pretrained_init_configuration, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.

config_class#

XLNetConfig 的别名

base_model_class#

XLNetModel 的别名

class XLNetModel(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

The bare XLNet Model outputting raw hidden-states.

This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods.

This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.

参数:

config (XLNetConfig) --

An instance of XLNetConfig.

备注

A normal_initializer initializes weight matrices as normal distributions. See XLNetPretrainedModel._init_weights() for how weights are initialized in XLNetModel.

get_input_embeddings()[源代码]#

get input embedding of model

返回:

embedding of model

返回类型:

nn.Embedding

set_input_embeddings(new_embeddings)[源代码]#

set new input embedding for model

参数:

value (Embedding) -- the new embedding of model

抛出:

NotImplementedError -- Model has not implement set_input_embeddings method

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[源代码]#

The XLNetModel forward method, overrides the __call__() special method.

参数:
  • input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It's data type should be int64 and has a shape of [batch_size, sequence_length].

  • token_type_ids (Tensor, optional) --

    Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:

    • 0 corresponds to a sentence A token,

    • 1 corresponds to a sentence B token.

    It's data type should be int64 and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.

  • attention_mask (Tensor, optional) --

    Mask to indicate whether to perform attention on each input token or not. The values should be either 0 or 1. The attention scores will be set to -infinity for any positions in the mask that are 0, and will be unchanged for positions that are 1.

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    It's data type should be float32 and has a shape of [batch_size, sequence_length]. Defaults to None.

  • mems (List[Tensor], optional) --

    A list of length n_layers with each Tensor being a pre-computed hidden-state for each layer. Each Tensor has a dtype float32 and a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, and we don't use mems.

    备注

    use_mems has to be set to True in order to make use of mems.

  • perm_mask (Tensor, optional) --

    Mask to indicate the permutation pattern of the input sequence with values being either 0 or 1.

    • if perm_mask[k, i, j] = 0, i attend to j in batch k;

    • if perm_mask[k, i, j] = 1, i does not attend to j in batch k.

    Only used during pretraining (to define factorization order) or for sequential decoding (generation). It's data type should be float32 and has a shape of [batch_size, sequence_length, sequence_length]. Defaults to None, then each token attends to all the other tokens (full bidirectional attention).

  • target_mapping (Tensor, optional) -- Mask to indicate the output tokens to use with values being either 0 or 1. If target_mapping[k, i, j] = 1, the i-th predict in batch k is on the j-th token. It's data type should be float32 and has a shape of [batch_size, num_predict, sequence_length]. Only used during pretraining for partial prediction or for sequential decoding (generation). Defaults to None.

  • input_mask (Tensor, optional) --

    Mask to avoid performing attention on padding token with values being either 0 or 1. It's data type should be float32 and it has a shape of [batch_size, sequence_length]. This mask is negative of attention_mask:

    • 1 for tokens that are masked,

    • 0 for tokens that are not masked.

    You should use only one of input_mask and attention_mask. Defaults to None.

  • head_mask (Tensor, optional) --

    Mask to nullify selected heads of the self-attention layers with values being either 0 or 1.

    • 1 indicates the head is not masked,

    • 0 indicates the head is masked.

    It's data type should be float32 and has a shape of [num_heads] or [num_layers, num_heads]. Defaults to None, which means we keep all heads.

  • inputs_embeds (Tensor, optional) -- An embedded representation tensor which is an alternative of input_ids. You should specify only either one of them to avoid contradiction. It's data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, which means we only specify input_ids.

  • use_mems_train (bool, optional) -- Whether or not to use recurrent memory mechanism during training. Defaults to False and we don't use recurrent memory mechanism in training mode.

  • use_mems_eval (bool, optional) -- Whether or not to use recurrent memory mechanism during evaluation. Defaults to False and we don't use recurrent memory mechanism in evaluation mode.

  • return_dict (bool, optional) -- Whether or not to return additional information other than the output tensor. If True, then returns information about output, new_mems, hidden_states and attentions which will also be formatted as a dict. Else only returns the output tensor. Defaults to False.

返回:

Returns tensor output or a dict with key-value pairs: {"last_hidden_state": output, "mems": mems, "hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • output (Tensor):

    Output of the final layer of the model. It's a Tensor of dtype float32 and has a shape of [batch_size, num_predict, hidden_size].

    备注

    num_predict corresponds to target_mapping.shape[1]. If target_mapping is None, then num_predict equals to sequence_length.

  • mems (List[Tensor]):

    A list of pre-computed hidden-states. The length of the list is n_layers. Each element in the list is a Tensor with dtype float32 and has a shape of [batch_size, sequence_length, hidden_size].

  • hidden_states (List[Tensor], optional):

    A list of Tensor containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of float32 and has a shape of [batch_size, sequence_length, hidden_size]. Being returned when output_hidden_states is set to True.

  • attentions (List[Tensor], optional):

    A list of Tensor containing attentions weights of each hidden layer. Each Tensor (one for each layer) has a data type of float32 and has a shape of [batch_size, num_heads, sequence_length, sequence_length]. Being returned when output_attentions is set to True.

返回类型:

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

last_hidden_states = outputs[0]
class XLNetForSequenceClassification(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

XLNet Model with a linear layer on top of the output layer, designed for sequence classification/regression tasks like GLUE tasks.

参数:

config (XLNetConfig) -- An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False, problem_type: str = 'single_label_classification')[源代码]#

The XLNetForSequenceClassification forward method, overrides the __call__() special method.

参数:
返回:

Returns tensor logits or a dict with key-value pairs: {"logits": logits, "mems": mems, "hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型:

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]
class XLNetForTokenClassification(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

XLNet Model with a linear layer on top of the hidden-states output layer, designed for token classification tasks like NER tasks.

参数:

config (XLNetConfig) -- An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[源代码]#

The XLNetForTokenClassification forward method, overrides the __call__() special method.

参数:
返回:

Returns tensor logits or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型:

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)

logits = outputs[0]
class XLNetLMHeadModel(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

XLNet Model with a language modeling head on top (linear layer with weights tied to the input embeddings).

参数:

config (XLNetConfig) -- An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[源代码]#

The XLNetLMHeadModel forward method, overrides the __call__() special method.

参数:
返回:

Returns tensor logits or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}.

With the corresponding fields:

  • logits (Tensor):

    Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型:

Tensor or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetLMHeadModel
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetLMHeadModel.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Hey, Paddle-paddle is awesome !")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
logits = outputs
class XLNetForMultipleChoice(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

XLNet Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RACE/SWAG tasks.

参数:

config (XLNetConfig) -- An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict=False)[源代码]#

The XLNetForMultipleChoice forward method, overrides the __call__() special method.

参数:
返回:

Returns tensor logtis or a dict with key-value pairs:

{"logits": logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}

With the corresponding fields: - logits (Tensor):

Classification scores before SoftMax (also called logits). It's data type should be float32 and has a shape of [batch_size, sequence_length, num_classes].

返回类型:

tensor or dict

示例

import paddle
from paddlenlp.transformers import XLNetForMultipleChoice, XLNetTokenizer
from paddlenlp.data import Pad, Dict
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForMultipleChoice.from_pretrained('xlnet-base-cased')
data = [
    {
        "question": "how do you turn on an ipad screen?",
        "answer1": "press the volume button.",
        "answer2": "press the lock button.",
        "label": 1,
    },
    {
        "question": "how do you indent something?",
        "answer1": "leave a space before starting the writing",
        "answer2": "press the spacebar",
        "label": 0,
    },
]
text = []
text_pair = []
for d in data:
    text.append(d["question"])
    text_pair.append(d["answer1"])
    text.append(d["question"])
    text_pair.append(d["answer2"])
inputs = tokenizer(text, text_pair)
batchify_fn = lambda samples, fn=Dict(
    {
        "input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_ids
        "token_type_ids": Pad(
            axis=0, pad_val=tokenizer.pad_token_type_id
        ),  # token_type_ids
    }
): fn(samples)
inputs = batchify_fn(inputs)
reshaped_logits = model(
    input_ids=paddle.to_tensor(inputs[0], dtype="int64"),
    token_type_ids=paddle.to_tensor(inputs[1], dtype="int64"),
)
print(reshaped_logits.shape)
# [2, 2]
class XLNetForQuestionAnswering(config: XLNetConfig)[源代码]#

基类:XLNetPretrainedModel

XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

参数:

config (XLNetConfig) -- An instance of XLNetConfig.

forward(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, start_positions=None, end_positions=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[源代码]#

The XLNetForQuestionAnswering forward method, overrides the __call__() special method.

参数:
返回:

Returns tensor (start_logits, end_logits) or a dict with key-value pairs:

{"start_logits": start_logits, "end_logits": end_logits, "mems": mems,

"hidden_states": hidden_states, "attentions": attentions}

With the corresponding fields: - start_logits (Tensor):

A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • end_logits (Tensor):

    A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].

  • mems (List[Tensor]):

    See XLNetModel.

  • hidden_states (List[Tensor], optional):

    See XLNetModel.

  • attentions (List[Tensor], optional):

    See XLNetModel.

返回类型:

tuple or dict

示例

import paddle
from paddlenlp.transformers.xlnet.modeling import XLNetForQuestionAnswering
from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForQuestionAnswering.from_pretrained('xlnet-base-cased')

inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!")
inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()}
outputs = model(**inputs)
start_logits = outputs[0]
end_logits = outputs[1]
XLNetForCausalLM#

XLNetLMHeadModel 的别名