modeling#
Modeling classes for UnifiedTransformer model.
- class UnifiedTransformerPretrainedModel(*args, **kwargs)[source]#
Bases:
PretrainedModelAn abstract class for pretrained UnifiedTransformer models. It provides UnifiedTransformer related
model_config_file,resource_files_names,pretrained_resource_files_map,pretrained_init_configuration,base_model_prefixfor downloading and loading pretrained models. SeePretrainedModelfor more details.- config_class#
alias of
UnifiedTransformerConfig
- base_model_class#
alias of
UnifiedTransformerModel
- class UnifiedTransformerModel(config: UnifiedTransformerConfig)[source]#
Bases:
UnifiedTransformerPretrainedModelThe bare UnifiedTransformer Model outputting raw hidden-states.
This model inherits from
PretrainedModel. Refer to the superclass documentation for the generic methods.This model is also a paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(value)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddingsmethod
- forward(input_ids: Tensor | None = None, token_type_ids: Tensor | None = None, position_ids: Tensor | None = None, attention_mask: Tensor | None = None, use_cache: bool | None = None, cache: Tuple[Tensor] | None = None, role_ids: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[source]#
The UnifiedTransformerModel forward method, overrides the special
__call__()method.- Parameters:
input_ids (Tensor, optional) – Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. It’s data type should be
int64and has a shape of [batch_size, sequence_length].token_type_ids (Tensor) –
Segment token indices to indicate first and second portions of the inputs. Indices can be either 0 or 1:
0 corresponds to a sentence A token,
1 corresponds to a sentence B token.
It’s data type should be
int64and has a shape of [batch_size, sequence_length].position_ids (Tensor) – The position indices of input sequence tokens. It’s data type should be
int64and has a shape of [batch_size, sequence_length].attention_mask (Tensor) –
A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape broadcasted to [batch_size, n_head, sequence_length, sequence_length].
When the data type is bool, the unwanted positions have
Falsevalues and the others haveTruevalues.When the data type is int, the unwanted positions have 0 values and the others have 1 values.
When the data type is float, the unwanted positions have
-INFvalues and the others have 0 values.
use_cache – (bool, optional): Whether or not use the model cache to speed up decoding. Defaults to False.
cache (list, optional) – It is a list, and each element in the list is
incremental_cacheproduced bypaddle.nn.TransformerEncoderLayer.gen_cache()method. Seepaddle.nn.TransformerEncoder.gen_cache()method for more details. It is only used for inference and should be None for training. Defaults to None.role_ids (Tensor, optional) –
- Indices of role ids indicated different roles.
It’s data type should be
int64and has a shape of
[batch_size, sequence_length]. Defaults to None.
inputs_embeds (Tensor, optional) – Optionally, instead of passing
input_idsyou can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model’s internal embedding lookup matrix. Default to None.output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See
attentionsunder returned tensors for more detail. Defaults toFalse.output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See
hidden_statesunder returned tensors for more detail. Defaults toFalse.return_dict (bool, optional) – Whether to return a
BaseModelOutputWithPastAndCrossAttentionsobject. IfFalse, the output will be a tuple of tensors. Defaults toFalse.
- Returns:
An instance of
BaseModelOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseandcache=None, returns a tensor representing the output ofUnifiedTransformerModel, with shape [batch_size, sequence_length, hidden_size]. The data type is float32 or float64.
Example
from paddlenlp.transformers import UnifiedTransformerModel from paddlenlp.transformers import UnifiedTransformerTokenizer model = UnifiedTransformerModel.from_pretrained('plato-mini') tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini') history = '我爱祖国' inputs = tokenizer.dialogue_encode( history, return_tensors=True, is_split_into_words=False) outputs = model(**inputs)
- class UnifiedTransformerLMHeadModel(config: UnifiedTransformerConfig)[source]#
Bases:
UnifiedTransformerPretrainedModelThe UnifiedTransformer Model with a language modeling head on top for generation tasks.
- Parameters:
unified_transformer (
UnifiedTransformerModel) – An instance ofUnifiedTransformerModel.
- forward(input_ids: Tensor | None = None, token_type_ids: Tensor | None = None, position_ids: Tensor | None = None, attention_mask: Tensor | None = None, masked_positions: Tensor | None = None, use_cache: bool | None = None, cache: Tuple[Tensor] | None = None, role_ids: Tensor | None = None, labels: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None)[source]#
The UnifiedTransformerLMHeadModel forward method, overrides the special
__call__()method.- Parameters:
input_ids (Tensor, optional) – See
UnifiedTransformerModel.token_type_ids (Tensor) – See
UnifiedTransformerModel.position_ids (Tensor) – See
UnifiedTransformerModel.attention_mask (Tensor) – See
UnifiedTransformerModel.use_cache – (bool, optional): See
UnifiedTransformerModel.cache (list, optional) – See
UnifiedTransformerModel.role_ids – (Tensor, optional): See
UnifiedTransformerModel.labels – (Tensor, optional): Labels for computing the left-to-right language modeling loss. Indices should be in
[-100, 0, ..., vocab_size](seeinput_idsdocstring) Tokens with indices set to-100are ignored (masked), the loss is only computed for the tokens with labels n[0, ..., vocab_size]inputs_embeds (Tensor, optional) – See
UnifiedTransformerModel.output_attentions (bool, optional) – See :class:
UnifiedTransformerModeloutput_hidden_states (bool, optional) – See :class:
UnifiedTransformerModelreturn_dict (bool, optional) – See :class:
UnifiedTransformerModel
- Returns:
An instance of
CausalLMOutputWithCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofCausalLMOutputWithCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseandcache=labels=None, returns a tensor representing the output ofUnifiedTransformerLMHeadModel, with shape [batch_size, sequence_length, vocab_size]. The data type is float32 or float64.
Example
from paddlenlp.transformers import UnifiedTransformerLMHeadModel from paddlenlp.transformers import UnifiedTransformerTokenizer model = UnifiedTransformerLMHeadModel.from_pretrained('plato-mini') tokenizer = UnifiedTransformerTokenizer.from_pretrained('plato-mini') history = '我爱祖国' inputs = tokenizer.dialogue_encode( history, return_tensors=True, is_split_into_words=False) logits = model(**inputs)
- UnifiedTransformerForMaskedLM#
alias of
UnifiedTransformerLMHeadModel