tokenizer¶
-
class
AutoTokenizer
(*args, **kwargs)[source]¶ Bases:
object
AutoClass can help you automatically retrieve the relevant model given the provided pretrained weights/vocabulary. AutoTokenizer is a generic tokenizer class that will be instantiated as one of the base tokenizer classes when created with the AutoTokenizer.from_pretrained() classmethod.
-
classmethod
from_pretrained
(pretrained_model_name_or_path, from_hf_hub=False, subfolder=None, *model_args, **kwargs)[source]¶ Creates an instance of
AutoTokenizer
. Related resources are loaded by specifying name of a built-in pretrained model, or a community-contributed pretrained model, or a local file directory path.- Parameters
pretrained_model_name_or_path (str) –
Name of pretrained model or dir path to load from. The string can be:
Name of built-in pretrained model
Name of a community-contributed pretrained model.
Local directory path which contains tokenizer related resources and tokenizer config file (“tokenizer_config.json”).
from_hf_hub (bool, optional) –
subfolder (str, optional) – Only works when loading from HuggingFace Hub.
*args (tuple) – position arguments for model
__init__
. If provided, use these as position argument values for tokenizer initialization.**kwargs (dict) – keyword arguments for model
__init__
. If provided, use these to update pre-defined keyword argument values for tokenizer initialization.
- Returns
An instance of
PretrainedTokenizer
.- Return type
Example
from paddlenlp.transformers import AutoTokenizer # Name of built-in pretrained model tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'> # Name of community-contributed pretrained model tokenizer = AutoTokenizer.from_pretrained('yingyibiao/bert-base-uncased-sst-2-finetuned') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'> # Load from local directory path tokenizer = AutoTokenizer.from_pretrained('./my_bert/') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'>
-
classmethod