modeling¶
Modeling classes for XLNet model.
-
class
XLNetPretrainedModel
(*args, **kwargs)[source]¶ Bases:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained XLNet models. It provides XLNet related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
config_class
¶ alias of
paddlenlp.transformers.xlnet.configuration.XLNetConfig
-
base_model_class
¶
-
-
class
XLNetModel
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
The bare XLNet Model outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- Parameters
config (
XLNetConfig
) –An instance of
XLNetConfig
.Note
A normal_initializer initializes weight matrices as normal distributions. See
XLNetPretrainedModel._init_weights()
for how weights are initialized inXLNetModel
.
-
get_input_embeddings
()[source]¶ get input embedding of model
- Returns
embedding of model
- Return type
nn.Embedding
-
set_input_embeddings
(new_embeddings)[source]¶ set new input embedding for model
- Parameters
value (Embedding) – the new embedding of model
- Raises
NotImplementedError – Model has not implement
set_input_embeddings
method
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetModel forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It’s data type should be
int64
and has a shape of [batch_size, sequence_length].token_type_ids (Tensor, optional) –
Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
It’s data type should be
int64
and has a shape of [batch_size, sequence_length]. Defaults to None, which means no segment embeddings is added to token embeddings.attention_mask (Tensor, optional) –
Mask to indicate whether to perform attention on each input token or not. The values should be either 0 or 1. The attention scores will be set to -infinity for any positions in the mask that are 0, and will be unchanged for positions that are 1.
1 for tokens that are not masked,
0 for tokens that are masked.
It’s data type should be
float32
and has a shape of [batch_size, sequence_length]. Defaults toNone
.mems (List[Tensor], optional) –
A list of length
n_layers
with each Tensor being a pre-computed hidden-state for each layer. Each Tensor has a dtypefloat32
and a shape of [batch_size, sequence_length, hidden_size]. Defaults to None, and we don’t use mems.Note
use_mems
has to be set toTrue
in order to make use ofmems
.perm_mask (Tensor, optional) –
Mask to indicate the permutation pattern of the input sequence with values being either 0 or 1.
if
perm_mask[k, i, j] = 0
, i attend to j in batch k;if
perm_mask[k, i, j] = 1
, i does not attend to j in batch k.
Only used during pretraining (to define factorization order) or for sequential decoding (generation). It’s data type should be
float32
and has a shape of [batch_size, sequence_length, sequence_length]. Defaults toNone
, then each token attends to all the other tokens (full bidirectional attention).target_mapping (Tensor, optional) – Mask to indicate the output tokens to use with values being either 0 or 1. If
target_mapping[k, i, j] = 1
, the i-th predict in batch k is on the j-th token. It’s data type should befloat32
and has a shape of [batch_size, num_predict, sequence_length]. Only used during pretraining for partial prediction or for sequential decoding (generation). Defaults toNone
.input_mask (Tensor, optional) –
Mask to avoid performing attention on padding token with values being either 0 or 1. It’s data type should be
float32
and it has a shape of [batch_size, sequence_length]. This mask is negative ofattention_mask
:1 for tokens that are masked,
0 for tokens that are not masked.
You should use only one of
input_mask
andattention_mask
. Defaults toNone
.head_mask (Tensor, optional) –
Mask to nullify selected heads of the self-attention layers with values being either 0 or 1.
1 indicates the head is not masked,
0 indicates the head is masked.
It’s data type should be
float32
and has a shape of [num_heads] or [num_layers, num_heads]. Defaults toNone
, which means we keep all heads.inputs_embeds (Tensor, optional) – An embedded representation tensor which is an alternative of
input_ids
. You should specify only either one of them to avoid contradiction. It’s data type should befloat32
and has a shape of [batch_size, sequence_length, hidden_size]. Defaults toNone
, which means we only specifyinput_ids
.use_mems_train (bool, optional) – Whether or not to use recurrent memory mechanism during training. Defaults to
False
and we don’t use recurrent memory mechanism in training mode.use_mems_eval (bool, optional) – Whether or not to use recurrent memory mechanism during evaluation. Defaults to
False
and we don’t use recurrent memory mechanism in evaluation mode.return_dict (bool, optional) – Whether or not to return additional information other than the output tensor. If True, then returns information about
output
,new_mems
,hidden_states
andattentions
which will also be formatted as a dict. Else only returns the output tensor. Defaults to False.
- Returns
Returns tensor
output
or a dict with key-value pairs: {“last_hidden_state”:output
, “mems”:mems
, “hidden_states”:hidden_states
, “attentions”:attentions
}.With the corresponding fields:
output
(Tensor):Output of the final layer of the model. It’s a Tensor of dtype
float32
and has a shape of [batch_size, num_predict, hidden_size].Note
num_predict
corresponds totarget_mapping.shape[1]
. Iftarget_mapping
isNone
, thennum_predict
equals tosequence_length
.
mems
(List[Tensor]):A list of pre-computed hidden-states. The length of the list is
n_layers
. Each element in the list is a Tensor with dtypefloat32
and has a shape of [batch_size, sequence_length, hidden_size].
hidden_states
(List[Tensor], optional):A list of Tensor containing hidden-states of the model at the output of each layer plus the initial embedding outputs. Each Tensor has a data type of
float32
and has a shape of [batch_size, sequence_length, hidden_size]. Being returned whenoutput_hidden_states
is set toTrue
.
attentions
(List[Tensor], optional):A list of Tensor containing attentions weights of each hidden layer. Each Tensor (one for each layer) has a data type of
float32
and has a shape of [batch_size, num_heads, sequence_length, sequence_length]. Being returned whenoutput_attentions
is set toTrue
.
- Return type
Tensor or dict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetModel from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetModel.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) last_hidden_states = outputs[0]
-
class
XLNetForSequenceClassification
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a linear layer on top of the output layer, designed for sequence classification/regression tasks like GLUE tasks.
- Parameters
config (
XLNetConfig
) – An instance ofXLNetConfig
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False, problem_type: str = 'single_label_classification')[source]¶ The XLNetForSequenceClassification forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – See
XLNetModel
.token_type_ids (Tensor, optional) – See
XLNetModel
.attention_mask (Tensor, optional) – See
XLNetModel
.mems (Tensor, optional) – See
XLNetModel
.perm_mask (Tensor, optional) – See
XLNetModel
.target_mapping (Tensor, optional) – See
XLNetModel
.input_mask (Tensor, optional) – See
XLNetModel
.head_mask (Tensor, optional) – See
XLNetModel
.inputs_embeds (Tensor, optional) – See
XLNetModel
.use_mems_train (bool, optional) – See
XLNetModel
.use_mems_eval (bool, optional) – See
XLNetModel
.return_dict (bool, optional) – See
XLNetModel
.
- Returns
Returns tensor
logits
or a dict with key-value pairs: {“logits”:logits
, “mems”:mems
, “hidden_states”:hidden_states
, “attentions”:attentions
}.With the corresponding fields:
logits
(Tensor):Classification scores before SoftMax (also called logits). It’s data type should be
float32
and has a shape of [batch_size, num_classes].
mems
(List[Tensor]):See
XLNetModel
.
hidden_states
(List[Tensor], optional):See
XLNetModel
.
attentions
(List[Tensor], optional):See
XLNetModel
.
- Return type
Tensor or dict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForSequenceClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]
-
class
XLNetForTokenClassification
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a linear layer on top of the hidden-states output layer, designed for token classification tasks like NER tasks.
- Parameters
config (
XLNetConfig
) – An instance ofXLNetConfig
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetForTokenClassification forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – See
XLNetModel
.token_type_ids (Tensor, optional) – See
XLNetModel
.attention_mask (Tensor, optional) – See
XLNetModel
.mems (Tensor, optional) – See
XLNetModel
.perm_mask (Tensor, optional) – See
XLNetModel
.target_mapping (Tensor, optional) – See
XLNetModel
.input_mask (Tensor, optional) – See
XLNetModel
.head_mask (Tensor, optional) – See
XLNetModel
.inputs_embeds (Tensor, optional) – See
XLNetModel
.use_mems_train (bool, optional) – See
XLNetModel
.use_mems_eval (bool, optional) – See
XLNetModel
.return_dict (bool, optional) – See
XLNetModel
.
- Returns
- Returns tensor
logits
or a dict with key-value pairs: {“logits”:
logits
, “mems”:mems
,
”hidden_states”:
hidden_states
, “attentions”:attentions
}.With the corresponding fields:
logits
(Tensor):Classification scores before SoftMax (also called logits). It’s data type should be
float32
and has a shape of [batch_size, sequence_length, num_classes].
mems
(List[Tensor]):See
XLNetModel
.
hidden_states
(List[Tensor], optional):See
XLNetModel
.
attentions
(List[Tensor], optional):See
XLNetModel
.
- Returns tensor
- Return type
Tensor or dict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForTokenClassification from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForTokenClassification.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs[0]
-
class
XLNetLMHeadModel
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a language modeling head on top (linear layer with weights tied to the input embeddings).
- Parameters
config (
XLNetConfig
) – An instance ofXLNetConfig
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶ The XLNetLMHeadModel forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – See
XLNetModel
.token_type_ids (Tensor, optional) – See
XLNetModel
.attention_mask (Tensor, optional) – See
XLNetModel
.mems (Tensor, optional) – See
XLNetModel
.perm_mask (Tensor, optional) – See
XLNetModel
.target_mapping (Tensor, optional) – See
XLNetModel
.input_mask (Tensor, optional) – See
XLNetModel
.head_mask (Tensor, optional) – See
XLNetModel
.inputs_embeds (Tensor, optional) – See
XLNetModel
.use_mems_train (bool, optional) – See
XLNetModel
.use_mems_eval (bool, optional) – See
XLNetModel
.return_dict (bool, optional) – See
XLNetModel
.
- Returns
- Returns tensor
logits
or a dict with key-value pairs: {“logits”:
logits
, “mems”:mems
,
”hidden_states”:
hidden_states
, “attentions”:attentions
}.With the corresponding fields:
logits
(Tensor):Classification scores before SoftMax (also called logits). It’s data type should be
float32
and has a shape of [batch_size, sequence_length, num_classes].
mems
(List[Tensor]):See
XLNetModel
.
hidden_states
(List[Tensor], optional):See
XLNetModel
.
attentions
(List[Tensor], optional):See
XLNetModel
.
- Returns tensor
- Return type
Tensor or dict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetLMHeadModel from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetLMHeadModel.from_pretrained('xlnet-base-cased') inputs = tokenizer("Hey, Paddle-paddle is awesome !") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) logits = outputs
-
class
XLNetForMultipleChoice
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RACE/SWAG tasks.
- Parameters
config (
XLNetConfig
) – An instance ofXLNetConfig
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_mems_train=False, use_mems_eval=False, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict=False)[source]¶ The XLNetForMultipleChoice forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – See
XLNetModel
.token_type_ids (Tensor, optional) – See
XLNetModel
.attention_mask (Tensor, optional) – See
XLNetModel
.mems (Tensor, optional) – See
XLNetModel
.perm_mask (Tensor, optional) – See
XLNetModel
.target_mapping (Tensor, optional) – See
XLNetModel
.input_mask (Tensor, optional) – See
XLNetModel
.head_mask (Tensor, optional) – See
XLNetModel
.inputs_embeds (Tensor, optional) – See
XLNetModel
.use_mems_train (bool, optional) – See
XLNetModel
.use_mems_eval (bool, optional) – See
XLNetModel
.return_dict (bool, optional) – See
XLNetModel
.
- Returns
- Returns tensor
logtis
or a dict with key-value pairs: {“logits”:
logits
, “mems”:mems
,
”hidden_states”:
hidden_states
, “attentions”:attentions
}With the corresponding fields: -
logits
(Tensor):Classification scores before SoftMax (also called logits). It’s data type should be
float32
and has a shape of [batch_size, sequence_length, num_classes].mems
(List[Tensor]):See
XLNetModel
.
hidden_states
(List[Tensor], optional):See
XLNetModel
.
attentions
(List[Tensor], optional):See
XLNetModel
.
- Returns tensor
- Return type
tensor or dict
Example
import paddle from paddlenlp.transformers import XLNetForMultipleChoice, XLNetTokenizer from paddlenlp.data import Pad, Dict tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForMultipleChoice.from_pretrained('xlnet-base-cased') data = [ { "question": "how do you turn on an ipad screen?", "answer1": "press the volume button.", "answer2": "press the lock button.", "label": 1, }, { "question": "how do you indent something?", "answer1": "leave a space before starting the writing", "answer2": "press the spacebar", "label": 0, }, ] text = [] text_pair = [] for d in data: text.append(d["question"]) text_pair.append(d["answer1"]) text.append(d["question"]) text_pair.append(d["answer2"]) inputs = tokenizer(text, text_pair) batchify_fn = lambda samples, fn=Dict( { "input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id), # input_ids "token_type_ids": Pad( axis=0, pad_val=tokenizer.pad_token_type_id ), # token_type_ids } ): fn(samples) inputs = batchify_fn(inputs) reshaped_logits = model( input_ids=paddle.to_tensor(inputs[0], dtype="int64"), token_type_ids=paddle.to_tensor(inputs[1], dtype="int64"), ) print(reshaped_logits.shape) # [2, 2]
-
class
XLNetForQuestionAnswering
(config: paddlenlp.transformers.xlnet.configuration.XLNetConfig)[source]¶ Bases:
paddlenlp.transformers.xlnet.modeling.XLNetPretrainedModel
XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute
span start logits
andspan end logits
).- Parameters
config (
XLNetConfig
) – An instance ofXLNetConfig
.
-
forward
(input_ids, token_type_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, start_positions=None, end_positions=None, input_mask=None, head_mask=None, inputs_embeds=None, use_mems_train=False, use_mems_eval=False, return_dict=False)[source]¶ The XLNetForQuestionAnswering forward method, overrides the
__call__()
special method.- Parameters
input_ids (Tensor) – See
XLNetModel
.token_type_ids (Tensor, optional) – See
XLNetModel
.attention_mask (Tensor, optional) – See
XLNetModel
.mems (Tensor, optional) – See
XLNetModel
.perm_mask (Tensor, optional) – See
XLNetModel
.target_mapping (Tensor, optional) – See
XLNetModel
.input_mask (Tensor, optional) – See
XLNetModel
.head_mask (Tensor, optional) – See
XLNetModel
.inputs_embeds (Tensor, optional) – See
XLNetModel
.use_mems_train (bool, optional) – See
XLNetModel
.use_mems_eval (bool, optional) – See
XLNetModel
.return_dict (bool, optional) – See
XLNetModel
.
- Returns
- Returns tensor (
start_logits
,end_logits
) or a dict with key-value pairs: {“start_logits”:
start_logits
, “end_logits”:end_logits
, “mems”:mems
,
”hidden_states”:
hidden_states
, “attentions”:attentions
}With the corresponding fields: -
start_logits
(Tensor):A tensor of the input token classification logits, indicates the start position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
end_logits
(Tensor):A tensor of the input token classification logits, indicates the end position of the labelled span. Its data type should be float32 and its shape is [batch_size, sequence_length].
mems
(List[Tensor]):See
XLNetModel
.
hidden_states
(List[Tensor], optional):See
XLNetModel
.
attentions
(List[Tensor], optional):See
XLNetModel
.
- Returns tensor (
- Return type
tuple or dict
Example
import paddle from paddlenlp.transformers.xlnet.modeling import XLNetForQuestionAnswering from paddlenlp.transformers.xlnet.tokenizer import XLNetTokenizer tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased') model = XLNetForQuestionAnswering.from_pretrained('xlnet-base-cased') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} outputs = model(**inputs) start_logits = outputs[0] end_logits = outputs[1]
-
XLNetForCausalLM
¶ alias of
paddlenlp.transformers.xlnet.modeling.XLNetLMHeadModel