modeling

class CodeGenAttention(embed_dim, rotary_dim, num_attention_heads, max_positions, attn_pdrop, resid_pdrop)[source]

Bases: paddle.fluid.dygraph.layers.Layer

forward(hidden_states, attention_mask=None, use_cache=False, cache=None, output_attentions=False)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class CodeGenMLP(embed_dim, inner_dim, activation_function, resid_pdrop)[source]

Bases: paddle.fluid.dygraph.layers.Layer

forward(hidden_states)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class CodeGenBlock(embed_dim, rotary_dim, n_head, n_ctx, attn_pdrop, resid_pdrop, activation_function, layer_norm_epsilon)[source]

Bases: paddle.fluid.dygraph.layers.Layer

forward(hidden_states, attention_mask=None, use_cache=False, cache=None, output_attentions=False)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class CodeGenPreTrainedModel(*args, **kwargs)[source]

Bases: paddlenlp.transformers.model_utils.PretrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

init_weights(layer)[source]

Initialize the weights.

base_model_class

alias of paddlenlp.transformers.codegen.modeling.CodeGenModel

class CodeGenModel(vocab_size, bos_token_id=0, pad_token_id=50256, eos_token_id=2, n_embd=1024, n_layer=20, n_head=16, n_ctx=2048, n_positions=2048, attn_pdrop=0.0, resid_pdrop=0.0, embd_pdrop=0.0, rotary_dim=32, activation_function='gelu_new', layer_norm_epsilon=1e-05, initializer_range=0.02)[source]

Bases: paddlenlp.transformers.codegen.modeling.CodeGenPreTrainedModel

The bare CodeGen Model outputting raw hidden-states. This model inherits from PretrainedModel. Refer to the superclass documentation for the generic methods. This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior. :param vocab_size: Vocabulary size of inputs_ids in CodeGenModel. Also is the vocab size of token embedding matrix.

Defines the number of different tokens that can be represented by the inputs_ids passed when calling CodeGenModel.

Parameters
  • bos_token_id (int, optional) – The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. Defaults to 0.

  • pad_token_id (int, optional) – The index of padding token in the token vocabulary. Defaults to 50256.

  • eos_toke_idn (int, optional) – A special token representing the end of a sequence that was used during pretraining. Defaults to 2.

  • n_embed (int, optional) – Dimensionality of the embedding layer, decoder layer. Defaults to 1024.

  • n_layer (int, optional) – Number of hidden layers. Defaults to 20.

  • n_head (int, optional) – Number of attention heads for each attention layer in the Transformer decoder. Defaults to 16.

  • n_ctx (int, optional) – Dimensionality of the causal mask (usually same as n_positions). Defaults to 2048.

  • n_positions (int, optional) – The maximum sequence length that this model might ever be used with. Defaults to 2048.

  • attn_pdrop (float, optional) – The dropout probability used in MultiHeadAttention in all decoder layers to drop some attention target. Defaults to 0.0.

  • resid_pdrop (float, optional) – The dropout probability for all residual layers in the decoder. Defaults to 0.0.

  • embd_pdrop (float, optional) – The dropout probability used in embedding layers. Defaults to 0.0.

  • rotary_dim (int, optional) – Dimensionality of rotay position embeddings. Defaults to 32.

  • activation_function (str, optional) – The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other paddle supported activation functions are supported. Defaults to "gelu_new".

  • layer_norm_epsilon (float, optional) – The epsilon to use in the layer normalization layers. Defaults to 1e-05.

  • initializer_range (float, optional) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Default to 0.02.

get_input_embeddings()[source]

get input embedding of model

Returns

embedding of model

Return type

nn.Embedding

set_input_embeddings(new_embeddings)[source]

set new input embedding for model

Parameters

value (Embedding) – the new embedding of model

Raises

NotImplementedError – Model has not implement set_input_embeddings method

forward(input_ids=None, attention_mask=None, token_type_ids=None, use_cache=False, cache=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]

The CodeGenModel forward method, overrides the __call__() special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They are

numerical representations of tokens that build the input sequence. Its data type should be int64 and it has a shape of [batch_size, sequence_length].

Parameters
  • attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the masked tokens have False values and the others have True values. When the data type is int, the masked tokens have 0 values and the others have 1 values. When the data type is float, the masked tokens have -INF values and the others have 0 values. It is a tensor with shape broadcasted to [batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults to None, which means nothing needed to be prevented attention to.

  • use_cache (bool, optional) – Whether or not to use cache. Defaults to False. If set to True, key value states will be returned and can be used to speed up decoding.

  • cache (list, optional) – It is a list, and each element in the list is a tuple (incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default to None.

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail. Defaults to False.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail. Defaults to False.

  • return_dict (bool, optional) – Whether to return a BaseModelOutputWithPastAndCrossAttentions object. If False, the output will be a tuple of tensors. Defaults to False.

Returns

An instance of BaseModelOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of BaseModelOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False and cache=None, returns a tensor representing the output of CodeGenModel. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].

Example

class CodeGenForCausalLM(transformer)[source]

Bases: paddlenlp.transformers.codegen.modeling.CodeGenPreTrainedModel

CodeGen Model with a language modeling head on top. :param transformer: An instance of CodeGenModel. :type transformer: CodeGenModel

get_output_embeddings()[source]

To be overwrited for models with output embeddings

Returns

the otuput embedding of model

Return type

Optional[Embedding]

forward(input_ids=None, attention_mask=None, token_type_ids=None, use_cache=False, cache=None, labels=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]

The CodeGenForCausalLM forward method, overrides the __call__() special method. :param input_ids: See CodeGenModel. :type input_ids: Tensor :param attention_mask: See CodeGenModel. :type attention_mask: Tensor, optional :param use_cache: See CodeGenModel. :type use_cache: bool, optional :param cache: See CodeGenModel. :type cache: Tensor, optional :param labels: (Tensor, optional):

Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-100, 0, ..., vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., vocab_size]

Parameters
  • output_attentions (bool, optional) – See :class: CodeGenModel

  • output_hidden_states (bool, optional) – See :class: CodeGenModel

  • return_dict (bool, optional) – See :class: CodeGenModel

Returns

An instance of CausalLMOutputWithPastAndCrossAttentions if return_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields of CausalLMOutputWithPastAndCrossAttentions. Especially, When return_dict=output_hidden_states=output_attentions=False and cache=labels=None, returns tensor lm_logits of shape [batch_size, sequence_length, vocab_size],

Example