model_utils#

class PretrainedModel(*args, **kwargs)[源代码]#

The base class for all pretrained models. It mainly provides common methods for loading (construction and loading) and saving pretrained models. Loading and saving also rely on the following class attributes which should be overridden by derived classes accordingly:

  • model_config_file (str): Represents the file name of model configuration for configuration saving and loading in local file system. The value is model_config.json.

  • resource_files_names (dict): Name of local file where the model configuration can be saved and loaded locally. Currently, resources only include the model state, thus the dict only includes 'model_state' as key with corresponding value 'model_state.pdparams' for model weights saving and loading.

  • pretrained_init_configuration (dict): Provides the model configurations of built-in pretrained models (contrasts to models in local file system). It has pretrained model names as keys (such as bert-base-uncased), and the values are dict preserving corresponding configuration for model initialization.

  • pretrained_resource_files_map (dict): Provides resource URLs of built-in pretrained models (contrasts to models in local file system). It has the same key as resource_files_names (that is "model_state"), and the corresponding value is a dict with specific model name to model weights URL mapping (such as "bert-base-uncased" -> "https://bj.bcebos.com/paddlenlp/models/transformers/bert-base-uncased.pdparams").

  • base_model_prefix (str): Represents the attribute associated to the base model in derived classes of the same architecture adding layers on top of the base model. Note: A base model class is pretrained model class decorated by register_base_model, such as BertModel; A derived model class is a pretrained model class adding layers on top of the base model, and it has a base model as attribute, such as BertForSequenceClassification.

Methods common to models for text generation are defined in GenerationMixin and also inherited here.

Besides, metaclass InitTrackerMeta is used to create PretrainedModel, by which subclasses can track arguments for initialization automatically.

init_weights()[源代码]#

If needed prunes and maybe initializes weights. If using a custom PreTrainedModel, you need to implement any initialization logic in _init_weights.

classmethod from_config(config, **kwargs)[源代码]#

All context managers that the model should be initialized under go here.

参数:

dtype (paddle.dtype, optional) -- Override the default paddle.dtype and load the model under this dtype.

property base_model#

The body of the same model architecture. It is the base model itself for base model or the base model attribute for derived model.

Type:

PretrainedModel

property model_name_list#

Contains all supported built-in pretrained model names of the current PretrainedModel class.

Type:

list

can_generate() bool[源代码]#

Returns whether this model can generate sequences with generate(). :returns: Whether this model can generate sequences with generate(). :rtype: bool

recompute_enable()[源代码]#

Enable Recompute. All layers with the enable_recompute attribute will be set to True

recompute_disable()[源代码]#

Disable Recompute. All layers with the enable_recompute attribute will be set to False

get_memory_footprint(return_buffers=True)[源代码]#

Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests.

参数:

return_buffers (bool, optional, defaults to True) -- Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers are tensors that do not require gradients and not registered as parameters

get_input_embeddings() Embedding[源代码]#

get input embedding of model

返回:

embedding of model

返回类型:

nn.Embedding

set_input_embeddings(value: Embedding)[源代码]#

set new input embedding for model

参数:

value (Embedding) -- the new embedding of model

抛出:

NotImplementedError -- Model has not implement set_input_embeddings method

get_output_embeddings() Embedding | None[源代码]#

To be overwrited for models with output embeddings

返回:

the otuput embedding of model

返回类型:

Optional[Embedding]

tie_weights()[源代码]#

Tie the weights between the input embeddings and the output embeddings.

resize_position_embeddings(new_num_position_embeddings: int)[源代码]#

resize position embedding, this method should be overrited overwrited by downstream models

参数:

new_num_position_embeddings (int) -- the new position size

抛出:

NotImplementedError -- when called and not be implemented

classmethod constructed_from_pretrained_config(init_func=None) bool[源代码]#

check if the model is constructed from PretrainedConfig :returns: if the model is constructed from PretrainedConfig :rtype: bool

save_model_config(save_dir: str)[源代码]#

Deprecated, please use config.save_pretrained() instead. Saves model configuration to a file named "config.json" under save_dir.

参数:

save_dir (str) -- Directory to save model_config file into.

save_to_hf_hub(repo_id: str, private: bool | None = None, subfolder: str | None = None, commit_message: str | None = None, revision: str | None = None, create_pr: bool = False)[源代码]#

Uploads all elements of this model to a new HuggingFace Hub repository. :param repo_id: Repository name for your model/tokenizer in the Hub. :type repo_id: str :param private: Whether the model/tokenizer is set to private :type private: bool, optional :param subfolder: Push to a subfolder of the repo instead of the root :type subfolder: str, optional :param commit_message: f"Upload {path_in_repo} with huggingface_hub" :type commit_message: str, optional :param revision: :type revision: str, optional :param create_pr: If revision is not set, PR is opened against the "main" branch. If revision is set and is a branch, PR is opened against this branch.

If revision is set and is not a branch name (example: a commit oid), an RevisionNotFoundError is returned by the server.

Returns: The url of the commit of your model in the given repository.

save_to_aistudio(repo_id, private=True, license='Apache License 2.0', exist_ok=True, safe_serialization=True, subfolder=None, merge_tensor_parallel=False, **kwargs)[源代码]#

Uploads all elements of this model to a new AiStudio Hub repository. :param repo_id: Repository name for your model/tokenizer in the Hub. :type repo_id: str :param token: Your token for the Hub. :type token: str :param private: Whether the model/tokenizer is set to private. Defaults to True. :type private: bool, optional :param license: The license of your model/tokenizer. Defaults to: "Apache License 2.0". :type license: str :param exist_ok: Whether to override existing repository. Defaults to: True. :type exist_ok: bool, optional :param safe_serialization: Whether to save the model in safe serialization way. Defaults to: True. :type safe_serialization: bool, optional :param subfolder: Push to a subfolder of the repo instead of the root :type subfolder: str, optional :param merge_tensor_parallel: Whether to merge the tensor parallel weights. Defaults to False. :type merge_tensor_parallel: bool

resize_token_embeddings(new_num_tokens: int | None = None) Embedding[源代码]#

Resizes input token embeddings matrix of the model according to new_num_tokens.

参数:

new_num_tokens (Optional[int]) -- The number of new tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided or None, just returns a pointer to the input tokens embedding module of the model without doing anything.

返回:

The input tokens Embeddings Module of the model.

返回类型:

paddle.nn.Embedding

classmethod from_pretrained(pretrained_model_name_or_path, *args, **kwargs)[源代码]#

Creates an instance of PretrainedModel. Model weights are loaded by specifying name of a built-in pretrained model, a pretrained model from HF Hub, a community contributed model, or a local file directory path.

参数:
  • pretrained_model_name_or_path (str) --

    Name of pretrained model or dir path to load from. The string can be:

    • Name of a built-in pretrained model

    • Name of a pretrained model from HF Hub

    • Name of a community-contributed pretrained model.

    • Local directory path which contains model weights file("model_state.pdparams") and model config file ("model_config.json").

  • from_hf_hub (bool) -- load model from huggingface hub. Default to False.

  • subfolder (str, optional) -- Only works when loading from Huggingface Hub.

  • *args (tuple) -- Position arguments for model __init__. If provided, use these as position argument values for model initialization.

  • **kwargs (dict) -- Keyword arguments for model __init__. If provided, use these to update pre-defined keyword argument values for model initialization. If the keyword is in __init__ argument names of base model, update argument values of the base model; else update argument values of derived model.

  • load_state_as_np (bool, optional) -- The weights read in can be choosed to place on CPU or GPU though the model is on the default device. If True, load the model weights as numpy.ndarray on CPU. Otherwise, weights would be loaded as tensors on the default device. Note that if on GPU, the latter would creates extra temporary tensors in addition to the model weights, which doubles the memory usage . Thus it is suggested to use True for big models on GPU. Default to False.

返回:

An instance of PretrainedModel.

返回类型:

PretrainedModel

示例

from paddlenlp.transformers import BertForSequenceClassification

# Name of built-in pretrained model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Name of pretrained model from PaddleHub
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Name of community-contributed pretrained model
model = BertForSequenceClassification.from_pretrained('yingyibiao/bert-base-uncased-sst-2-finetuned', num_labels=3)

# Load from local directory path
model = BertForSequenceClassification.from_pretrained('./my_bert/')
save_pretrained(save_dir: str | ~os.PathLike, is_main_process: bool = True, state_dict: dict | None = None, save_function: ~typing.Callable = <function save>, max_shard_size: int | str = '10GB', safe_serialization: bool = False, variant: str | None = None, *args, **kwargs)[源代码]#

Saves model configuration and related resources (model state) as files under save_dir. The model configuration would be saved into a file named "model_config.json", and model state would be saved into a file named "model_state.pdparams".

The save_dir can be used in from_pretrained as argument value of pretrained_model_name_or_path to re-load the trained model.

参数:

save_dir (str) -- Directory to save files into.

示例

from paddlenlp.transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.save_pretrained('./trained_model/')
# reload from save_directory
model = BertForSequenceClassification.from_pretrained('./trained_model/')
register_base_model(cls)[源代码]#

A decorator for PretrainedModel class. It first retrieves the parent class of the class being decorated, then sets the base_model_class attribute of that parent class to be the class being decorated. In summary, the decorator registers the decorated class as the base model class in all derived classes under the same architecture.

参数:

cls (PretrainedModel) -- The class (inherited from PretrainedModel) to be decorated .

返回:

The input class cls after decorating.

返回类型:

PretrainedModel

示例

from paddlenlp.transformers import BertModel, register_base_model

BertModel = register_base_model(BertModel)
assert BertModel.base_model_class == BertModel