tokenizer#
- class FunnelTokenizer(vocab_file, do_lower_case=True, unk_token='<unk>', sep_token='<sep>', pad_token='<pad>', cls_token='<cls>', mask_token='<mask>', bos_token='<s>', eos_token='</s>', do_basic_tokenize=True, never_split=None, tokenize_chinese_chars=True, strip_accents=None, **kwargs)[source]#
Bases:
BertTokenizer
- create_token_type_ids_from_sequences(token_ids_0: List[int], token_ids_1: List[int] | None = None) List[int] [source]#
Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel Transformer sequence pair mask has the following format:
` 2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence | `
Iftoken_ids_1
isNone
, this method only returns the first portion of the mask (0s). :param token_ids_0: List of IDs. :type token_ids_0:List[int]
:param token_ids_1: Optional second list of IDs for sequence pairs. :type token_ids_1:List[int]
, optional- Returns:
List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
- Return type:
List[int]