modeling¶
-
class
DalleBartModel
(text_vocab_size=50264, image_vocab_size=16384, bos_token_id=16384, pad_token_id=16384, eos_token_id=16384, max_text_length=64, max_image_length=256, decoder_start_token_id=16384, d_model=1024, num_encoder_layers=12, num_decoder_layers=12, encoder_attention_heads=16, decoder_attention_heads=16, encoder_ffn_dim=2730, decoder_ffn_dim=2730, dropout=0.0, activation_function='gelu', attention_dropout=0.0, activation_dropout=0.0, use_bias=False, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.dallebart.modeling.DalleBartPretrainedModel
The bare DalleBart Model outputting raw hidden-states. This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods. This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior. :param text_vocab_size: Vocabulary size ofinputs_ids
inDalleBartModel
. Also is the vocab size of text token embedding matrix.Defines the number of different tokens that can be represented by the
inputs_ids
passed when callingDalleBartModel
.- 参数
image_vocab_size (int) -- Vocabulary size of
decoder_inputs_ids
inDalleBartModel
. Also is the vocab size of image token embedding matrix. Defines the number of different tokens that can be represented by thedecoder_inputs_ids
passed when callingDalleBartModel
.bos_token (int, optional) -- The beginning of image sequence token that was used during pretraining. Defaults to
16384
.pad_token_id (int, optional) -- The index of padding token in the image token vocabulary. Defaults to
16384
.eos_token (int, optional) -- A special token representing the end of a image sequence. Defaults to
16384
.max_text_length (int, optional) -- The maximum value of the dimensionality of text position encoding, which dictates the maximum supported length of the text input sequence. Defaults to
64
.max_image_length (int, optional) -- The maximum value of the dimensionality of image position encoding, which dictates the maximum supported length of the image input sequence. Defaults to
256
.decoder_start_token_id (int, optional) -- The id indicating the start of decoding image sentence. Defaults to
16384
.d_model (int, optional) -- Dimensionality of the embedding layer, encoder layer and decoder layer. Defaults to
1024
.num_encoder_layers (int, optional) -- Number of hidden layers in the
DalleBartEncoder
. Defaults to12
.num_decoder_layers (int, optional) -- Number of hidden layers in the
DalleBartDecoder
. Defaults to12
.encoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the
DalleBartEncoder
. Defaults to16
.decoder_attention_heads (int, optional) -- Number of attention heads for each attention layer in the
DalleBartDecoder
. Defaults to16
.encoder_ffn_dim (int, optional) -- Dimensionality of the Gated Linear Units (glu) layer in the encoder. Input tensors to glu layers are firstly projected from
d_model
toencoder_ffn_dim
, and then projected back tod_model
. Typicallyencoder_ffn_dim
is larger thand_model
. Defaults to2730
.decoder_ffn_dim (int, optional) -- Dimensionality of the Gated Linear Units (glu) layer in the encoder. Input tensors to glu layers are firstly projected from
d_model
todecoder_ffn_dim
, and then projected back tod_model
. Typicallydecoder_ffn_dim
is larger thand_model
. Defaults to2730
.dropout (float, optional) -- The dropout probability used in all fully connected layers (pre-process and post-process of MHA and FFN sub-layer) in the encoders and decoders. Defaults to
0.
.activation_function (str, optional) -- The non-linear activation function in the glu layer.
"gelu"
,"relu"
and any other paddle supported activation functions are supported. Defaults to"gelu"
.attention_dropout (float, optional) -- The dropout probability used in MultiHeadAttention in all encoder layers and decoder layers to drop some attention target. Defaults to
0.
.activation_dropout (float, optional) -- The dropout probability used after glu activation in all encoder layers and decoder layers. Defaults to
0.
.use_bias (bool, optional) -- Whether or not use bias in all linear layers. Defaults to
False
.init_std (float, optional) -- The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Default to
0.02
.
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The DalleBartModel forward method, overrides the
__call__()
special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They arenumerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].- 参数
attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
masked
tokens haveFalse
values and the others haveTrue
values. When the data type is int, themasked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have-INF
values and the others have0
values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.decoder_input_ids (Tensor, optional) -- Indices of decoder input sequence tokens in the vocabulary. Its data type should be
int64
and it has a shape of [batch_size, sequence_length]. Defaults toNone
, which means nodecoder_input_ids
is provided, the model will create the tensor by shifting theinput_ids
to the right.decoder_attention_mask (Tensor, optional) -- Mask used in multi-head attention to avoid performing attention to some unwanted positions in
decoder_input_ids
. Its data type and shape is the same asattention_mask
. Defaults toNone
.encoder_output (tuple, optional) -- The output of the encoder, a tuple consists
last_hidden_state
,hidden_states`(optional), `attentions`(optional). The data type of `last_hidden_state
is float32 and its shape is[batch_size, sequence_length, hidden_size]
.hidden_states
is hidden_states of all layers in the Transformer encoder. The length ofhidden_states
isnum_hidden_layers + 1
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, sequence_length, hidden_size
].attentions
is attentions of all layers of in the Transformer encoder. The length ofattentions
isnum_hidden_layers
. For all element in the tuple, its data type should be float32 and its shape is [batch_size, num_attention_heads, sequence_length, sequence_length
].use_cache (bool, optional) -- Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) -- It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.
- 返回
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
示例
-
class
DalleBartPretrainedModel
(*args, **kwargs)[源代码]¶ 基类:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained Bart models. It provides DalleBart related
model_config_file
,pretrained_init_configuration
,resource_files_names
,pretrained_resource_files_map
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶ alias of
paddlenlp.transformers.dallebart.modeling.DalleBartModel
-
-
class
DalleBartEncoder
(d_model=1024, nhead=16, dim_feedforward=2730, max_text_length=64, text_vocab_size=50264, text_pad_token_id=1, encoder_layers=12, dropout=0.0, activation='gelu', attn_dropout=None, act_dropout=None, bias_attr=False, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.dallebart.modeling.DalleBartPretrainedModel
The Encoder of DalleBartModel. The arguments of DalleBartEncoder can see
DalleBartModel
.-
forward
(input_ids, attention_mask=None, **kwargs)[源代码]¶ The DalleBartEncoder forward method, overrides the
__call__()
special method. :param input_ids: SeeDalleBartModel
. :type input_ids: Tensor, optional :param attention_mask: SeeDalleBartModel
. :type attention_mask: Tensor, optional- 返回
Returns tensor
encoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
-
-
class
DalleBartDecoder
(d_model=1024, nhead=16, dim_feedforward=2730, image_vocab_size=16384, max_image_length=256, decoder_layers=12, dropout=0.0, activation='gelu', attn_dropout=None, act_dropout=None, bias_attr=False, init_std=0.02)[源代码]¶ 基类:
paddlenlp.transformers.dallebart.modeling.DalleBartPretrainedModel
The Decoder of DalleBartModel. The arguments of DalleBartDecoder can see
DalleBartModel
.-
forward
(decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, memory_mask=None, cache=None)[源代码]¶ The DalleBartDecoder forward method, overrides the
__call__()
special method. :param decoder_input_ids: SeeDalleBartModel
. :type decoder_input_ids: Tensor, optional :param decoder_attention_mask: SeeDalleBartModel
. :type decoder_attention_mask: Tensor, optional :param encoder_output: SeeDalleBartModel
. :type encoder_output: Tensor, optional :param memory_mask: SeeDalleBartModel
. :type memory_mask: Tensor, optional :param cache: SeeDalleBartModel
. :type cache: Tensor, optional- 返回
Returns tensor
decoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
-
-
class
DalleBartForConditionalGeneration
(dallebart)[源代码]¶ 基类:
paddlenlp.transformers.dallebart.modeling.DalleBartPretrainedModel
DalleBart Model with a
language modeling
head on top. :param dallebart: An instance of DalleBartModel. :type dallebart:DalleBartModel
-
forward
(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_output=None, use_cache=False, cache=None)[源代码]¶ The DalleBartForConditionalGeneration forward method, overrides the __call__() special method. :param input_ids: See
DalleBartModel
. :type input_ids: Tensor :param attention_mask: SeeDalleBartModel
. :type attention_mask: Tensor, optional :param decoder_input_ids: SeeDalleBartModel
. :type decoder_input_ids: Tensor,optional
:param decoder_attention_mask: SeeDalleBartModel
. :type decoder_attention_mask: Tensor, optional :param encoder_output: SeeDalleBartModel
. :type encoder_output: Tensor, optonal :param use_cache: SeeDalleBartModel
. :type use_cache: bool, optional :param cache: SeeDalleBartModel
. :type cache: Tensor, optional- 返回
Returns Tensor
lm_logits
ifuse_cache
isFalse
, otherwise, returns tuple (lm_logits
,cache
). With the fields: -lm_logits
(Tensor):The generated sentence of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, vocab_size].
cache
(Tensor):See
DalleBartModel
.
- 返回类型
Tensor or tuple
示例
-
generate
(input_ids=None, max_length=256, min_length=256, decode_strategy='sampling', temperature=1.0, top_k=0, top_p=1.0, repetition_penalty=1.0, num_beams=1, num_beam_groups=1, length_penalty=0.0, early_stopping=False, bos_token_id=None, eos_token_id=None, pad_token_id=None, text_pad_token_id=1, decoder_start_token_id=None, forced_bos_token_id=None, forced_eos_token_id=None, num_return_sequences=1, diversity_rate=0.0, use_cache=True, use_fast=False, use_fp16_decoding=False, condition_scale=1.0, **model_kwargs)[源代码]¶ The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".
- 参数
input_ids (Tensor, optional) -- The input sequence ids for the generation. It is a Tensor with shape [batch_size, sequence_length]. The data type should be int32 or int64. Default to None, which we will initialize it as a Tensor with shape [1, 1], filled with the value
bos_token_id
.max_length (int, optional) -- The maximum length of the sequence to be generated. Default to 256.
min_length (int, optional) -- The minimum length of the sequence to be generated. Default to 256.
decode_strategy (str, optional) -- The decoding strategy in generation. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search". Default to "sampling".
temperature (float, optional) -- The value used to module the next token probabilities in the "sampling" strategy. Default to 1.0, which means no effect.
top_k (int, optional) -- The number of highest probability tokens to keep for top-k-filtering in the "sampling" strategy. Default to 0, which means no effect.
top_p (float, optional) -- The cumulative probability for top-p-filtering in the "sampling" strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.
repetition_penalty (float, optional) -- The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Defaults to 1.0.
num_beams (int, optional) -- The number of beams in the "beam_search" strategy. Default to 1.
num_beam_groups (int, optional) -- Number of groups to divide
num_beams
into in order to use DIVERSE BEAM SEARCH. See this paper for more details. Default to 1.length_penalty (float, optional) -- The exponential penalty to the sequence length in the "beam_search" strategy. The larger this param is, the more that the model would generate shorter sequences. Default to 0.0, which means no penalty.
early_stopping (bool, optional) -- Whether to stop searching in the "beam_search" strategy when at least
num_beams
sentences are finished per batch or not. Default to False.bos_token_id (int, optional) -- The id of the
bos_token
. Default to None.eos_token_id (int, optional) -- The id of the
eos_token
. Default to None.pad_token_id (int, optional) -- The id of the
pad_token
. Default to None.decoder_start_token_id (int, optional) -- The start token id for encoder-decoder models. Default to None.
forced_bos_token_id (int, optional) -- The id of the token to force as the first generated token. Usually use for multilingual models. Default to None.
forced_eos_token_id (int, optional) -- The id of the token to force as the last generated token. Default to None.
num_return_sequences (int, optional) -- The number of returned sequences for each sequence in the batch. Default to 1.
diversity_rate (float, optional) -- If num_beam_groups is 1, this is the diversity_rate for Diverse Siblings Search. See `this paper https://arxiv.org/abs/1611.08562`__ for more details. If not, this is the diversity_rate for DIVERSE BEAM SEARCH.
use_cache -- (bool, optional): Whether to use the model cache to speed up decoding. Default to True.
use_fast -- (bool, optional): Whether to use fast entry of model for FastGeneration. Default to False.
use_fp16_decoding -- (bool, optional): Whether to use fp16 for decoding. Only works when fast entry is avalible. Default to False.
condition_scale (float, optional) -- The scale of super conditioning. See this twitter Default to 1.0.
model_kwargs (dict) -- It can be used to specify additional kwargs passed to the model.
- 返回
It is a tuple contains two elements: ids and scores. Each element is a Tensor.
With the fields:
- ids (Tensor):
The ids of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, sequence_length]. The data type is same as the input
input_ids
.
- scores (Tensor):
The scores of the generated sequences. It is a Tensor with shape [batch_size * num_return_sequences, 1]. The data type is float32 or float64, which is the same as the parameters in the model.
- 返回类型
tuple[Tensor]
示例
import paddle from paddlenlp.transformers import ( DalleBartForConditionalGeneration, DalleBartTokenizer ) # Initialize the model and tokenizer model_name_or_path = 'dalle-mini' model = DalleBartForConditionalGeneration.from_pretrained(model_name_or_path) tokenizer = DalleBartTokenizer.from_pretrained(model_name_or_path) # Prepare the model inputs. prompts = "graphite sketch of Elon Musk" tokenized_inputs = tokenizer( prompts, return_tensors="pd", padding="max_length", truncation=True, return_attention_mask=True, max_length=64, ) # Generate 4 sequences by using "sampling" strategy (top_k=64, condition_scale=10.0) image_token_ids, scores = model.generate( input_ids=tokenized_inputs['input_ids'], attention_mask=tokenized_inputs['attention_mask'], decode_strategy="sampling", condition_scale=10.0, top_k=64, num_return_sequences=4) print(image_token_ids.shape, scores.shape) # [4, 256] [4, 1]
-
-
class
DalleBartForImageGeneration
(dallebart)[源代码]¶ 基类:
paddlenlp.transformers.dallebart.modeling.DalleBartForConditionalGeneration
DalleBart Model with a
language modeling
head andVQGanTokenizer
on top. :param dallebart: An instance of DalleBartModel. :type dallebart:DalleBartModel
-
generate
(input_ids, attention_mask=None, top_k=0, top_p=1.0, temperature=1.0, condition_scale=1.0, num_return_sequences=1, **kwargs)[源代码]¶ The DalleBartForImageGeneration generate method. :param input_ids: See
DalleBartForConditionalGeneration
. :type input_ids: Tensor :param attention_mask: SeeDalleBartForConditionalGeneration
. :type attention_mask: Tensor, optional :param top_k: The number of highest probability tokens tokeep for top-k-filtering in the "sampling" strategy. Default to 0, which means no effect.
- 参数
top_p (float, optional) -- The cumulative probability for top-p-filtering in the "sampling" strategy. The value should satisfy \(0 <= top\_p < 1\). Default to 1.0, which means no effect.
temperature (float, optional) -- The value used to module the next token probabilities in the "sampling" strategy. Default to 1.0, which means no effect.
condition_scale (float, optional) -- The scale of super conditioning. See this twitter Default to 1.0.
num_return_sequences (int, optional) -- The number of returned sequences for each sequence in the batch. Default to 1.
- 返回
Returns tensor
images
, which is the output ofVQGanDetokenizer
. Its data type should be uint8 and has a shape of [batch_size, num_return_sequences, 256, 256, 3].- 返回类型
Tensor
示例
-