
class DalleBartModel(config: DalleBartConfig)



get input embedding of model


embedding of model




set new input embedding for model


value (Embedding) -- the new embedding of model


NotImplementedError -- Model has not implement set_input_embeddings method

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)

The DalleBartModel forward method, overrides the __call__() special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They are

numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

  • attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be int64 and it has a shape of [batch_size, sequence_length]. Defaults to None, which means no decoder_input_ids is provided, the model will create the tensor by shifting the input_ids to the right.

  • decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in decoder_input_ids. Its data type and shape is the same as attention_mask. Defaults to None.

  • encoder_output (tuple, optional) -- The output of the encoder, a tuple consists last_hidden_state, hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state is float32 and its shape is [batch_size, sequence_length, hidden_size]. hidden_states is hidden_states of all layers in the Transformer encoder. The length of hidden_states is num_hidden_layers + 1. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size]. attentions is attentions of all layers of in the Transformer encoder. The length of attentions is num_hidden_layers. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length].

  • use_cache (bool, optional) -- Whether or not to use cache. Defaults to False. If set to True, key value states will be returned and can be used to speed up decoding.

  • cache (list, optional) -- It is a list, and each element in the list is a tuple (incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default to None.


Returns tensor decoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].




class DalleBartPretrainedModel(*args, **kwargs)


An abstract class for pretrained Bart models. It provides DalleBart related model_config_file, pretrained_init_configuration, resource_files_names, pretrained_resource_files_map, base_model_prefix for downloading and loading pretrained models. See PretrainedModel for more details.


DalleBartConfig 的别名


DalleBartModel 的别名

class DalleBartEncoder(config: DalleBartConfig)


The Encoder of DalleBartModel. The arguments of DalleBartEncoder can see DalleBartModel.

forward(input_ids, attention_mask=None, **kwargs)

The DalleBartEncoder forward method, overrides the __call__() special method. :param input_ids: See DalleBartModel. :type input_ids: Tensor, optional :param attention_mask: See DalleBartModel. :type attention_mask: Tensor, optional


Returns tensor encoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].



class DalleBartDecoder(config: DalleBartConfig)


The Decoder of DalleBartModel. The arguments of DalleBartDecoder can see DalleBartModel.

forward(decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, memory_mask=None, cache=None)

The DalleBartDecoder forward method, overrides the __call__() special method. :param decoder_input_ids: See DalleBartModel. :type decoder_input_ids: Tensor, optional :param decoder_attention_mask: See DalleBartModel. :type decoder_attention_mask: Tensor, optional :param encoder_output: See DalleBartModel. :type encoder_output: Tensor, optional :param memory_mask: See DalleBartModel. :type memory_mask: Tensor, optional :param cache: See DalleBartModel. :type cache: Tensor, optional


Returns tensor decoder_output, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].



class DalleBartForConditionalGeneration(config: DalleBartConfig)


DalleBart Model with a language modeling head on top. :param config: An instance of DalleBartConfig used to construct DalleBartForConditionalGeneration. :type config: DalleBartConfig

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)

The DalleBartForConditionalGeneration forward method, overrides the __call__() special method. :param input_ids: See DalleBartModel. :type input_ids: Tensor :param attention_mask: See DalleBartModel. :type attention_mask: Tensor, optional :param decoder_input_ids: See DalleBartModel. :type decoder_input_ids: Tensor, optional :param decoder_attention_mask: See DalleBartModel. :type decoder_attention_mask: Tensor, optional :param encoder_output: See DalleBartModel. :type encoder_output: Tensor, optonal :param use_cache: See DalleBartModel. :type use_cache: bool, optional :param cache: See DalleBartModel. :type cache: Tensor, optional


Returns Tensor lm_logits if use_cache is False, otherwise, returns tuple (lm_logits, cache). With the fields: - lm_logits (Tensor):

The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].


Tensor or tuple


generate(input_ids=None, max_length=256, min_length=256, decode_strategy='sampling', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, num_beam_groups=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, text_pad_token_id=1, decoder_start_token_id=None, forced_bos_token_id=None, forced_eos_token_id=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, use_fast=False, use_fp16_decoding=False, condition_scale=1.0, **model_kwargs)

The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

  • input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value bos_token_id.

  • max_length (int, optional) -- The maximum length of the sequence to be generated. Default to 256.

  • min_length (int, optional) -- The minimum length of the sequence to be generated. Default to 256.

  • decode_strategy (str, optional) -- The decoding strategy in generation. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search". Default to "sampling".

  • temperature (float, optional) -- The value used to module the next token probabilities in the "sampling" strategy. Default to 1.0, which means no effect.

  • top_k (int, optional) -- The number of highest probability tokens to keep for top-k-filtering in the "sampling" strategy. Default to 0, which means no effect.

  • top_p (float, optional) -- The cumulative probability for top-p-filtering in the "sampling" strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.

  • repetition_penalty (float, optional) -- The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.

  • num_beams (int, optional) -- The number of beams in the "beam_search" strategy. Default to 1.

  • num_beam_groups (int, optional) -- Number of groups to divide num_beams into in order to use DIVERSE BEAM SEARCH. See this paper for more details. Default to 1.

  • length_penalty (float, optional) -- The exponential penalty to the sequence length in the "beam_search" strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.

  • early_stopping (bool, optional) -- Whether to stop searching in the "beam_search" strategy when at least num_beams sentences are finished per batch or not. Default to False.

  • bos_token_id (int, optional) -- The id of the bos_token. Default to None.

  • eos_token_id (int, optional) -- The id of the eos_token. Default to None.

  • pad_token_id (int, optional) -- The id of the pad_token. Default to None.

  • decoder_start_token_id (int, optional) -- The start token id for encoder-decoder models. Default to None.

  • forced_bos_token_id (int, optional) -- The id of the token to force as the first generated token. Usually use for multilingual models. Default to None.

  • forced_eos_token_id (int, optional) -- The id of the token to force as the last generated token. Default to None.

  • num_return_sequences (int, optional) -- The number of returned sequences for each sequence in the batch. Default to 1.

  • diversity_rate (float, optional) -- If num_beam_groups is 1, this is the diversity_rate for Diverse Siblings Search. See `this paper`__ for more details. If not, this is the diversity_rate for DIVERSE BEAM SEARCH.

  • use_cache -- (bool, optional): Whether to use the model cache to speed up decoding. Default to True.

  • use_fast -- (bool, optional): Whether to use fast entry of model for FastGeneration. Default to False.

  • use_fp16_decoding -- (bool, optional): Whether to use fp16 for decoding. Only works when fast entry is avalible. Default to False.

  • condition_scale (float, optional) -- The scale of super conditioning. See this twitter Default to 1.0.

  • model_kwargs (dict) -- It can be used to specify additional kwargs passed to the model.


It is a tuple contains two elements: ids and scores. Each element is a Tensor.

With the fields:

  • ids (Tensor):

    The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input input_ids.

  • scores (Tensor):

    The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.




import paddle
from paddlenlp.transformers import (

# Initialize the model and tokenizer
model_name_or_path = 'dalle-mini'
model = DalleBartForConditionalGeneration.from_pretrained(model_name_or_path)
tokenizer = DalleBartTokenizer.from_pretrained(model_name_or_path)

# Prepare the model inputs.
prompts = "graphite sketch of Elon Musk"
tokenized_inputs = tokenizer(

# Generate 4 sequences by using "sampling" strategy (top_k=64, condition_scale=10.0)
image_token_ids, scores = model.generate(
print(image_token_ids.shape, scores.shape)
# [4, 256] [4, 1]