modeling#
- class CodeGenAttention(config: CodeGenConfig)[source]#
Bases:
Layer- forward(hidden_states: Tensor, attention_mask: Tensor | None = None, use_cache: bool | None = False, cache: Tuple[Tensor] | None = None, output_attentions: bool | None = False) Tuple[source]#
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters:
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- class CodeGenBlock(config: CodeGenConfig)[source]#
Bases:
Layer- forward(hidden_states: Tensor, attention_mask: Tensor | None = None, use_cache: bool | None = False, cache: Tuple[Tensor] | None = None, output_attentions: bool | None = False) Tuple[source]#
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters:
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- class CodeGenPreTrainedModel(*args, **kwargs)[source]#
Bases:
PretrainedModelAn abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.
- config_class#
alias of
CodeGenConfig
- base_model_class#
alias of
CodeGenModel
- class CodeGenModel(config: CodeGenConfig)[source]#
Bases:
CodeGenPreTrainedModelThe bare CodeGen Model outputting raw hidden-states. This model inherits from
PretrainedModel. Refer to the superclass documentation for the generic methods. This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior. :param config: An instance of CodeGenConfig used to construct CodeGenModel. :type config:CodeGenConfig- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(new_embeddings)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddingsmethod
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, token_type_ids: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Tensor]] | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | BaseModelOutputWithPastAndCrossAttentions[source]#
The CodeGenModel forward method, overrides the
__call__()special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They arenumerical representations of tokens that build the input sequence. Its data type should be
int64and it has a shape of [batch_size, sequence_length].- Parameters:
attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
maskedtokens haveFalsevalues and the others haveTruevalues. When the data type is int, themaskedtokens have0values and the others have1values. When the data type is float, themaskedtokens have-INFvalues and the others have0values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone, which means nothing needed to be prevented attention to.use_cache (bool, optional) – Whether or not to use cache. Defaults to
False. If set toTrue, key value states will be returned and can be used to speed up decoding.cache (list, optional) – It is a list, and each element in the list is a tuple
(incremental_cache, static_cache). See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone.inputs_embeds (Tensor, optional) – Optionally, instead of passing
input_idsyou can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size). This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model’s internal embedding lookup matrix. Default to None.output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See
attentionsunder returned tensors for more detail. Defaults toFalse.output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See
hidden_statesunder returned tensors for more detail. Defaults toFalse.return_dict (bool, optional) – Whether to return a
BaseModelOutputWithPastAndCrossAttentionsobject. IfFalse, the output will be a tuple of tensors. Defaults toFalse.
- Returns:
An instance of
BaseModelOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseandcache=None, returns a tensor representing the output ofCodeGenModel. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].
Example
- class CodeGenForCausalLM(config: CodeGenConfig)[source]#
Bases:
CodeGenPreTrainedModelCodeGen Model with a
language modelinghead on top. :param config: An instance of CodeGenConfig used to construct CodeGenForCausalLM. :type config:CodeGenConfig- get_output_embeddings()[source]#
To be overwrited for models with output embeddings
- Returns:
the otuput embedding of model
- Return type:
Optional[Embedding]
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, token_type_ids: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Tensor]] | None = None, labels: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | CausalLMOutputWithCrossAttentions[source]#
The CodeGenForCausalLM forward method, overrides the __call__() special method. :param input_ids: See
CodeGenModel. :type input_ids: Tensor, optional :param attention_mask: SeeCodeGenModel. :type attention_mask: Tensor, optional :param use_cache: SeeCodeGenModel. :type use_cache: bool, optional :param cache: SeeCodeGenModel. :type cache: Tensor, optional :param labels: (Tensor, optional):Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
labels = input_idsIndices are selected in[-100, 0, ..., vocab_size]All labels set to-100are ignored (masked), the loss is only computed for labels in[0, ..., vocab_size]- Parameters:
inputs_embeds (Tensor, optional) – See
CodeGenModel.output_attentions (bool, optional) – See :class:
CodeGenModel.output_hidden_states (bool, optional) – See :class:
CodeGenModel.return_dict (bool, optional) – See :class:
CodeGenModel.
- Returns:
An instance of
CausalLMOutputWithPastAndCrossAttentionsifreturn_dict=True. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofCausalLMOutputWithPastAndCrossAttentions. Especially, Whenreturn_dict=output_hidden_states=output_attentions=Falseandcache=labels=None, returns tensorlm_logitsof shape [batch_size, sequence_length, vocab_size],
Example