modeling#
- class CodeGenAttention(config: CodeGenConfig)[source]#
Bases:
Layer
- forward(hidden_states: Tensor, attention_mask: Tensor | None = None, use_cache: bool | None = False, cache: Tuple[Tensor] | None = None, output_attentions: bool | None = False) Tuple [source]#
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters:
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- class CodeGenBlock(config: CodeGenConfig)[source]#
Bases:
Layer
- forward(hidden_states: Tensor, attention_mask: Tensor | None = None, use_cache: bool | None = False, cache: Tuple[Tensor] | None = None, output_attentions: bool | None = False) Tuple [source]#
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters:
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- class CodeGenPreTrainedModel(*args, **kwargs)[source]#
Bases:
PretrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.
- config_class#
alias of
CodeGenConfig
- base_model_class#
alias of
CodeGenModel
- class CodeGenModel(config: CodeGenConfig)[source]#
Bases:
CodeGenPreTrainedModel
The bare CodeGen Model outputting raw hidden-states. This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods. This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior. :param config: An instance of CodeGenConfig used to construct CodeGenModel. :type config:CodeGenConfig
- get_input_embeddings()[source]#
get input embedding of model
- Returns:
embedding of model
- Return type:
nn.Embedding
- set_input_embeddings(new_embeddings)[source]#
set new input embedding for model
- Parameters:
value (Embedding) – the new embedding of model
- Raises:
NotImplementedError – Model has not implement
set_input_embeddings
method
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, token_type_ids: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Tensor]] | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | BaseModelOutputWithPastAndCrossAttentions [source]#
The CodeGenModel forward method, overrides the
__call__()
special method. :param input_ids: Indices of input sequence tokens in the vocabulary. They arenumerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].- Parameters:
attention_mask (Tensor, optional) – Mask used in multi-head attention to avoid performing attention to some unwanted positions, usually the paddings or the subsequent positions. Its data type can be int, float and bool. When the data type is bool, the
masked
tokens haveFalse
values and the others haveTrue
values. When the data type is int, themasked
tokens have0
values and the others have1
values. When the data type is float, themasked
tokens have-INF
values and the others have0
values. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Defaults toNone
, which means nothing needed to be prevented attention to.use_cache (bool, optional) – Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) – It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.inputs_embeds (Tensor, optional) – Optionally, instead of passing
input_ids
you can choose to directly pass an embedded representation of shape(batch_size, sequence_length, hidden_size)
. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix. Default to None.output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See
attentions
under returned tensors for more detail. Defaults toFalse
.output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See
hidden_states
under returned tensors for more detail. Defaults toFalse
.return_dict (bool, optional) – Whether to return a
BaseModelOutputWithPastAndCrossAttentions
object. IfFalse
, the output will be a tuple of tensors. Defaults toFalse
.
- Returns:
An instance of
BaseModelOutputWithPastAndCrossAttentions
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofBaseModelOutputWithPastAndCrossAttentions
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
andcache=None
, returns a tensor representing the output ofCodeGenModel
. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].
Example
- class CodeGenForCausalLM(config: CodeGenConfig)[source]#
Bases:
CodeGenPreTrainedModel
CodeGen Model with a
language modeling
head on top. :param config: An instance of CodeGenConfig used to construct CodeGenForCausalLM. :type config:CodeGenConfig
- get_output_embeddings()[source]#
To be overwrited for models with output embeddings
- Returns:
the otuput embedding of model
- Return type:
Optional[Embedding]
- forward(input_ids: Tensor | None = None, attention_mask: Tensor | None = None, token_type_ids: Tensor | None = None, use_cache: bool | None = None, cache: List[Tuple[Tensor]] | None = None, labels: Tensor | None = None, inputs_embeds: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) Tuple | CausalLMOutputWithCrossAttentions [source]#
The CodeGenForCausalLM forward method, overrides the __call__() special method. :param input_ids: See
CodeGenModel
. :type input_ids: Tensor, optional :param attention_mask: SeeCodeGenModel
. :type attention_mask: Tensor, optional :param use_cache: SeeCodeGenModel
. :type use_cache: bool, optional :param cache: SeeCodeGenModel
. :type cache: Tensor, optional :param labels: (Tensor, optional):Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
labels = input_ids
Indices are selected in[-100, 0, ..., vocab_size]
All labels set to-100
are ignored (masked), the loss is only computed for labels in[0, ..., vocab_size]
- Parameters:
inputs_embeds (Tensor, optional) – See
CodeGenModel
.output_attentions (bool, optional) – See :class:
CodeGenModel
.output_hidden_states (bool, optional) – See :class:
CodeGenModel
.return_dict (bool, optional) – See :class:
CodeGenModel
.
- Returns:
An instance of
CausalLMOutputWithPastAndCrossAttentions
ifreturn_dict=True
. Otherwise it returns a tuple of tensors corresponding to ordered and not None (depending on the input arguments) fields ofCausalLMOutputWithPastAndCrossAttentions
. Especially, Whenreturn_dict=output_hidden_states=output_attentions=False
andcache=labels=None
, returns tensorlm_logits
of shape [batch_size, sequence_length, vocab_size],
Example