modeling#
- class DalleBartModel(config: DalleBartConfig)[source]#
Bases:
DalleBartPretrainedModel
- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(value)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddings
method
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[source]#
The DalleBartModel forward method, overrides the
__call__()
special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They arenumerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].- Parameters:
attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
masked
tokens haveFalse
values and the others haveTrue
values. When the data type is int, themasked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have-INF
values and the others have0
values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) – Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) – The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is[batch_size, sequence_length, hidden_size]
.hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size
].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length
].use_cache (bool, optional) – Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) – It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.
- Returns:
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- Return type:
Tensor
Example
- class DalleBartPretrainedModel(*args, **kwargs)[source]#
Bases:
PretrainedModel
An abstract class for pretrained Bart models. It provides DalleBart related
model_config_file
,pretrained_init_configuration
,resource_files_names
,pretrained_resource_files_map
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.- config_class#
alias of
DalleBartConfig
- base_model_class#
alias of
DalleBartModel
- class DalleBartEncoder(config: DalleBartConfig)[source]#
Bases:
DalleBartPretrainedModel
The Encoder of DalleBartModel. The arguments of DalleBartEncoder can see
DalleBartModel
.- forward(input_ids, attention_mask=None, **kwargs)[source]#
The DalleBartEncoder forward method, overrides the
__call__()
special method. :param input_ids: SeeDalleBartModel
. :type input_ids: Tensor, optional :param attention_mask: SeeDalleBartModel
. :type attention_mask: Tensor, optional- Returns:
Returns tensor
encoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- Return type:
Tensor
- class DalleBartDecoder(config: DalleBartConfig)[source]#
Bases:
DalleBartPretrainedModel
The Decoder of DalleBartModel. The arguments of DalleBartDecoder can see
DalleBartModel
.- forward(decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, memory_mask=None, cache=None)[source]#
The DalleBartDecoder forward method, overrides the
__call__()
special method. :param decoder_input_ids: SeeDalleBartModel
. :type decoder_input_ids: Tensor, optional :param decoder_attention_mask: SeeDalleBartModel
. :type decoder_attention_mask: Tensor, optional :param encoder_output: SeeDalleBartModel
. :type encoder_output: Tensor, optional :param memory_mask: SeeDalleBartModel
. :type memory_mask: Tensor, optional :param cache: SeeDalleBartModel
. :type cache: Tensor, optional- Returns:
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- Return type:
Tensor
- class DalleBartForConditionalGeneration(config: DalleBartConfig)[source]#
Bases:
DalleBartPretrainedModel
DalleBart Model with a
language modeling
head on top. :param config: An instance of DalleBartConfig used to construct DalleBartForConditionalGeneration. :type config:DalleBartConfig
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[source]#
The DalleBartForConditionalGeneration forward method, overrides the __call__() special method. :param input_ids: See
DalleBartModel
. :type input_ids: Tensor :param attention_mask: SeeDalleBartModel
. :type attention_mask: Tensor, optional :param decoder_input_ids: SeeDalleBartModel
. :type decoder_input_ids: Tensor,optional
:param decoder_attention_mask: SeeDalleBartModel
. :type decoder_attention_mask: Tensor, optional :param encoder_output: SeeDalleBartModel
. :type encoder_output: Tensor, optonal :param use_cache: SeeDalleBartModel
. :type use_cache: bool, optional :param cache: SeeDalleBartModel
. :type cache: Tensor, optional- Returns:
Returns Tensor
lm_logits
ifuse_cache
isFalse
, otherwise, returns tuple (lm_logits
,cache
). With the fields: -lm_logits
(Tensor):The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].
cache
(Tensor):See
DalleBartModel
.
- Return type:
Tensor or tuple
Example
- generate(input_ids=None, max_length=256, min_length=256, decode_strategy='sampling', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, num_beam_groups=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, text_pad_token_id=1, decoder_start_token_id=None, forced_bos_token_id=None, forced_eos_token_id=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, use_fast=False, use_fp16_decoding=False, condition_scale=1.0, **model_kwargs)[source]#
The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”.
- Parameters:
input_ids (Tensor, optional) – The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value
bos_token_id
.max_length (int, optional) – The maximum length of the sequence to be generated. Default to 256.
min_length (int, optional) – The minimum length of the sequence to be generated. Default to 256.
decode_strategy (str, optional) – The decoding strategy in generation. Currently, there are three decoding strategies supported: “greedy_search”, “sampling” and “beam_search”. Default to “sampling”.
temperature (float, optional) – The value used to module the next token probabilities in the “sampling” strategy. Default to 1.0, which means no effect.
top_k (int, optional) – The number of highest probability tokens to keep for top-k-filtering in the “sampling” strategy. Default to 0, which means no effect.
top_p (float, optional) – The cumulative probability for top-p-filtering in the “sampling” strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.
repetition_penalty (float, optional) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.
num_beams (int, optional) – The number of beams in the “beam_search” strategy. Default to 1.
num_beam_groups (int, optional) – Number of groups to divide
num_beams
into in order to use DIVERSE BEAM SEARCH. See this paper for more details. Default to 1.length_penalty (float, optional) – The exponential penalty to the sequence length in the “beam_search” strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.
early_stopping (bool, optional) – Whether to stop searching in the “beam_search” strategy when at least
num_beams
sentences are finished per batch or not. Default to False.bos_token_id (int, optional) – The id of the
bos_token
. Default to None.eos_token_id (int, optional) – The id of the
eos_token
. Default to None.pad_token_id (int, optional) – The id of the
pad_token
. Default to None.decoder_start_token_id (int, optional) – The start token id for encoder-decoder models. Default to None.
forced_bos_token_id (int, optional) – The id of the token to force as the first generated token. Usually use for multilingual models. Default to None.
forced_eos_token_id (int, optional) – The id of the token to force as the last generated token. Default to None.
num_return_sequences (int, optional) – The number of returned sequences for each sequence in the batch. Default to 1.
diversity_rate (float, optional) – If num_beam_groups is 1, this is the diversity_rate for Diverse Siblings Search. See `this paper https://arxiv.org/abs/1611.08562`__ for more details. If not, this is the diversity_rate for DIVERSE BEAM SEARCH.
use_cache – (bool, optional): Whether to use the model cache to speed up decoding. Default to True.
use_fast – (bool, optional): Whether to use fast entry of model for FastGeneration. Default to False.
use_fp16_decoding – (bool, optional): Whether to use fp16 for decoding. Only works when fast entry is avalible. Default to False.
condition_scale (float, optional) – The scale of super conditioning. See this twitter Default to 1.0.
model_kwargs (dict) – It can be used to specify additional kwargs passed to the model.
- Returns:
It is a tuple contains two elements: ids and scores. Each element is a Tensor.
With the fields:
- ids (Tensor):
The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input
input_ids
.
- scores (Tensor):
The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.
- Return type:
tuple[Tensor]
Example
import paddle from paddlenlp.transformers import ( DalleBartForConditionalGeneration, DalleBartTokenizer ) # Initialize the model and tokenizer model_name_or_path = 'dalle-mini' model = DalleBartForConditionalGeneration.from_pretrained(model_name_or_path) tokenizer = DalleBartTokenizer.from_pretrained(model_name_or_path) # Prepare the model inputs. prompts = "graphite sketch of Elon Musk" tokenized_inputs = tokenizer( prompts, return_tensors="pd", padding="max_length", truncation=True, return_attention_mask=True, max_length=64, ) # Generate 4 sequences by using "sampling" strategy (top_k=64, condition_scale=10.0) image_token_ids, scores = model.generate( input_ids=tokenized_inputs['input_ids'], attention_mask=tokenized_inputs['attention_mask'], decode_strategy="sampling", condition_scale=10.0, top_k=64, num_return_sequences=4) print(image_token_ids.shape, scores.shape) # [4, 256] [4, 1]