tokenizer#
- class AutoTokenizer(*args, **kwargs)[source]#
Bases:
object
AutoClass can help you automatically retrieve the relevant model given the provided pretrained weights/vocabulary. AutoTokenizer is a generic tokenizer class that will be instantiated as one of the base tokenizer classes when created with the AutoTokenizer.from_pretrained() classmethod.
- classmethod from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)[source]#
Creates an instance of
AutoTokenizer
. Related resources are loaded by specifying name of a built-in pretrained model, or a community-contributed pretrained model, or a local file directory path.- Parameters:
pretrained_model_name_or_path (str) –
Name of pretrained model or dir path to load from. The string can be:
Name of built-in pretrained model
Name of a community-contributed pretrained model.
Local directory path which contains tokenizer related resources and tokenizer config file (“tokenizer_config.json”).
*args (tuple) – position arguments for model
__init__
. If provided, use these as position argument values for tokenizer initialization.**kwargs (dict) – keyword arguments for model
__init__
. If provided, use these to update pre-defined keyword argument values for tokenizer initialization.
- Returns:
An instance of
PretrainedTokenizer
.- Return type:
Example
from paddlenlp.transformers import AutoTokenizer # Name of built-in pretrained model tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'> # Name of community-contributed pretrained model tokenizer = AutoTokenizer.from_pretrained('yingyibiao/bert-base-uncased-sst-2-finetuned') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'> # Load from local directory path tokenizer = AutoTokenizer.from_pretrained('./my_bert/') print(type(tokenizer)) # <class 'paddlenlp.transformers.bert.tokenizer.BertTokenizer'>