utils#
- convert_ndarray_dtype(np_array: ndarray, target_dtype: str) ndarray[source]#
convert ndarray
- Parameters:
np_array (np.ndarray) – numpy ndarray instance
target_dtype (str) – the target dtype
- Returns:
converted numpy ndarray instance
- Return type:
np.ndarray
- get_scale_by_dtype(dtype: str | None = None, return_positive: bool = True) float[source]#
get scale value by dtype
- Parameters:
dtype (str) – the string dtype value
- Returns:
the scale value
- Return type:
float
- fn_args_to_dict(func, *args, **kwargs)[source]#
Inspect function
funcand its arguments for running, and extract a dict mapping between argument names and keys.
- adapt_stale_fwd_patch(self, name, value)[source]#
Since there are some monkey patches for forward of PretrainedModel, such as model compression, we make these patches compatible with the latest forward method.
- class InitTrackerMeta(name, bases, attrs)[source]#
Bases:
typeThis metaclass wraps the
__init__method of a class to addinit_configattribute for instances of that class, andinit_configuse a dict to track the initial configuration. If the class has_pre_initor_post_initmethod, it would be hooked before or after__init__and called as_pre_init(self, init_fn, init_args)or_post_init(self, init_fn, init_args). Since InitTrackerMeta would be used as metaclass for pretrained model classes, which always are Layer andtype(Layer)is nottype, thus usetype(Layer)rather thantypeas base class for it to avoid inheritance metaclass conflicts.- static init_and_track_conf(init_func, pre_init_func=None, post_init_func=None)[source]#
wraps
init_funcwhich is__init__method of a class to addinit_configattribute for instances of that class. :param init_func: It should be the__init__method of a class.warning:
selfalways is the class type of down-stream model, eg: BertForTokenClassification- Parameters:
pre_init_func (callable, optional) – If provided, it would be hooked after
init_funcand called aspre_init_func(self, init_func, *init_args, **init_args). Default None.post_init_func (callable, optional) – If provided, it would be hooked after
init_funcand called aspost_init_func(self, init_func, *init_args, **init_args). Default None.
- Returns:
the wrapped function
- Return type:
function
- param_in_func(func, param_field: str) bool[source]#
check if the param_field is in
funcmethod, eg: if thebertparam is in__init__method- Parameters:
cls (type) – the class of PretrainedModel
param_field (str) – the name of field
- Returns:
the result of existence
- Return type:
bool
- resolve_cache_dir(from_hf_hub: bool, from_aistudio: bool, cache_dir: str | None = None) str[source]#
resolve cache dir for PretrainedModel and PretrainedConfig
- Parameters:
from_hf_hub (bool) – if load from huggingface hub
cache_dir (str) – cache_dir for models
- find_transformer_model_type(model_class: Type) str[source]#
- get the model type from module name,
- eg:
BertModel -> bert, RobertaForTokenClassification -> roberta
- Parameters:
model_class (Type) – the class of model
- Returns:
the type string
- Return type:
str
- find_transformer_model_class_by_name(model_name: str) Type[PretrainedModel] | None[source]#
find transformer model_class by name
- Parameters:
model_name (str) – the string of class name
- Returns:
optional pretrained-model class
- Return type:
Optional[Type[PretrainedModel]]
- convert_file_size_to_int(size: int | str)[source]#
Converts a size expressed as a string with digits an unit (like
"5MB") to an integer (in bytes). :param size: The size to convert. Will be directly returned if anint. :type size:intorstrExample:
`py >>> convert_file_size_to_int("1MiB") 1048576 `
- cached_file(path_or_repo_id: str | PathLike, filename: str, cache_dir: str | PathLike | None = None, subfolder: str = '', from_aistudio: bool = False, _raise_exceptions_for_missing_entries: bool = True, _raise_exceptions_for_connection_errors: bool = True, pretrained_model_name_or_path=None) str[source]#
Tries to locate a file in a local folder and repo, downloads and cache it if necessary. :param path_or_repo_id: This can be either:
a string, the model id of a model repo on huggingface.co.
a path to a directory potentially containing the file.
- Parameters:
filename (
str) – The name of the file to locate inpath_or_repo.cache_dir (
stroros.PathLike, optional) – Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.subfolder (
str, optional, defaults to"") – In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.
- Returns:
Returns the resolved file (to the cache folder if downloaded from a repo).
- Return type:
Optional[str]
Examples:
`python # Download a model weight from the Hub and cache it. model_weights_file = cached_file("bert-base-uncased", "pytorch_model.bin") `
- get_checkpoint_shard_files(pretrained_model_name_or_path, index_filename, cache_dir=None, subfolder='', from_aistudio=False, from_hf_hub=False)[source]#
For a given model: - download and cache all the shards of a sharded checkpoint if
pretrained_model_name_or_pathis a model ID on theHub
returns the list of paths to all the shards, as well as some metadata.
For the description of each arg, see [
PretrainedModel.from_pretrained].index_filenameis the full path to the index (downloaded and cached ifpretrained_model_name_or_pathis a model ID on the Hub).
- class ContextManagers(context_managers: List[AbstractContextManager])[source]#
Bases:
objectWrapper for
contextlib.ExitStackwhich enters a collection of context managers. Adaptation ofContextManagersin thefastcorelibrary.
- dtype_byte_size(dtype)[source]#
Returns the size (in bytes) occupied by one parameter of type
dtype.Example:
`py >>> dtype_byte_size(paddle.float32) 4 `
- class CaptureStd(out=True, err=True, replay=True)[source]#
Bases:
objectContext manager to capture:
stdout: replay it, clean it up and make it available via
obj.outstderr: replay it and make it available via
obj.err
- Parameters:
out (
bool, optional, defaults toTrue) – Whether to capture stdout or not.err (
bool, optional, defaults toTrue) – Whether to capture stderr or not.replay (
bool, optional, defaults toTrue) – Whether to replay or not. By default each captured stream gets replayed back on context’s exit, so that one can see what the test was doing. If this is a not wanted behavior and the captured data shouldn’t be replayed, passreplay=Falseto disable this feature.
Examples:
```python # to capture stdout only with auto-replay with CaptureStdout() as cs:
print(“Secret message”)
assert “message” in cs.out
# to capture stderr only with auto-replay import sys
- with CaptureStderr() as cs:
print(“Warning: “, file=sys.stderr)
assert “Warning” in cs.err
# to capture both streams with auto-replay with CaptureStd() as cs:
print(“Secret message”) print(“Warning: “, file=sys.stderr)
assert “message” in cs.out assert “Warning” in cs.err
# to capture just one of the streams, and not the other, with auto-replay with CaptureStd(err=False) as cs:
print(“Secret message”)
assert “message” in cs.out # but best use the stream-specific subclasses
# to capture without auto-replay with CaptureStd(replay=False) as cs:
print(“Secret message”)