utils

download_file(save_dir, filename, url, md5=None)[源代码]

Download the file from the url to specified directory. Check md5 value when the file is exists, if the md5 value is the same as the existed file, just use the older file, if not, will download the file from the url.

参数
  • save_dir (string) -- The specified directory saving the file.

  • filename (string) -- The specified filename saving the file.

  • url (string) -- The url downling the file.

  • md5 (string, optional) -- The md5 value that checking the version downloaded.

download_check(task)[源代码]

Check the resource statuc in the specified task.

参数

task (string) -- The name of specified task.

add_docstrings(*docstr)[源代码]

The function that add the doc string to doc of class.

cut_chinese_sent(para)[源代码]

Cut the Chinese sentences more precisely, reference to "https://blog.csdn.net/blmoistawinde/article/details/82379256".

class TermTreeNode(sid: str, term: str, base: str, node_type: str = 'term', term_type: Optional[str] = None, hyper: Optional[str] = None, level: Optional[int] = None, alias: Optional[List[str]] = None, alias_ext: Optional[List[str]] = None, sub_type: Optional[List[str]] = None, sub_term: Optional[List[str]] = None, data: Optional[Dict[str, Any]] = None)[源代码]

基类:object

Defination of term node. All members are protected, to keep rigorism of data struct.

参数
  • sid (str) -- term id of node.

  • term (str) -- term, common name of this term.

  • base (str) -- cb indicates concept base, eb indicates entity base.

  • term_type (Optional[str], optional) -- type of this term, constructs hirechical of term node. Defaults to None.

  • hyper (Optional[str], optional) -- parent type of a type node. Defaults to None.

  • node_type (str, optional) -- type statement of node, type or term. Defaults to "term".

  • alias (Optional[List[str]], optional) -- alias of this term. Defaults to None.

  • alias_ext (Optional[List[str]], optional) -- extended alias of this term, CANNOT be used in matching. Defaults to None.

  • sub_type (Optional[List[str]], optional) -- grouped by some term. Defaults to None.

  • sub_term (Optional[List[str]], optional) -- some lower term. Defaults to None.

  • data (Optional[Dict[str, Any]], optional) -- to sore full imformation of a term. Defaults to None.

classmethod from_dict(data: Dict[str, Any])[源代码]

Build a node from dictionary data.

参数

data (Dict[str, Any]) -- Dictionary data contain all k-v data.

返回

TermTree node object.

返回类型

[type]

classmethod from_json(json_str: str)[源代码]

Build a node from JSON string.

参数

json_str (str) -- JSON string formatted by TermTree data.

返回

TermTree node object.

返回类型

[type]

class TermTree[源代码]

基类:object

TermTree class.

add_term(term: Optional[str] = None, base: Optional[str] = None, term_type: Optional[str] = None, sub_type: Optional[List[str]] = None, sub_term: Optional[List[str]] = None, alias: Optional[List[str]] = None, alias_ext: Optional[List[str]] = None, data: Optional[Dict[str, Any]] = None)[源代码]

Add a term into TermTree.

参数
  • term (str) -- common name of name.

  • base (str) -- term is concept or entity.

  • term_type (str) -- term type of this term

  • sub_type (Optional[List[str]], optional) -- sub type of this term, must exists in TermTree. Defaults to None.

  • sub_terms (Optional[List[str]], optional) -- sub terms of this term. Defaults to None.

  • alias (Optional[List[str]], optional) -- alias of this term. Defaults to None.

  • alias_ext (Optional[List[str]], optional) -- . Defaults to None.

  • data (Optional[Dict[str, Any]], optional) -- [description]. Defaults to None.

find_term(term: str, term_type: Optional[str] = None)Tuple[bool, Optional[List[str]]][源代码]

Find a term in Term Tree. If term not exists, return None. If term_type is not None, will find term with this type.

参数
  • term (str) -- term to look up.

  • term_type (Optional[str], optional) -- find term in this term_type. Defaults to None.

返回

[description]

返回类型

Union[None, List[str]]

build_from_dir(term_schema_path, term_data_path, linking=True)[源代码]

Build TermTree from a directory which should contain type schema and term data.

参数

dir ([type]) -- [description]

classmethod from_dir(term_schema_path, term_data_path, linking)paddlenlp.taskflow.utils.TermTree[源代码]

Build TermTree from a directory which should contain type schema and term data.

参数

source_dir ([type]) -- [description]

返回

[description]

返回类型

TermTree

save(save_dir)[源代码]

Save term tree to directory save_dir

参数

save_dir ([type]) -- Directory.

levenstein_distance(s1: str, s2: str)int[源代码]

Calculate minimal Levenstein distance between s1 and s2.

参数
  • s1 (str) -- string

  • s2 (str) -- string

返回

the minimal distance.

返回类型

int

class BurkhardKellerNode(word: str)[源代码]

基类:object

Node implementatation for BK-Tree. A BK-Tree node stores the information of current word, and its approximate words calculated by levenstein distance.

参数

word (str) -- word of current node.

class BurkhardKellerTree[源代码]

基类:object

Implementataion of BK-Tree

add(word: str)[源代码]

Insert a word into current tree. If tree is empty, set this word to root.

参数

word (str) -- word to be inserted.

search_similar_word(word: str)List[str][源代码]

Search the most similar (minimal levenstain distance) word between s.

参数

s (str) -- target word

返回

similar words.

返回类型

List[str]

class TriedTree[源代码]

基类:object

Implementataion of TriedTree

add_word(word)[源代码]

add single word into TriedTree

search(content)[源代码]

Backward maximum matching

参数

content (str) -- string to be searched

返回

list of maximum matching words, each element represents

the starting and ending position of the matching string.

返回类型

List[Tuple]

class Customization[源代码]

基类:object

User intervention based on Aho-Corasick automaton

load_customization(filename, sep=None)[源代码]

Load the custom vocab

parse_customization(query, lac_tags, prefix=False)[源代码]

Use custom vocab to modify the lac results

class SchemaTree(name='root', children=None)[源代码]

基类:object

Implementataion of SchemaTree

get_bool_ids_greater_than(probs, limit=0.5, return_prob=False)[源代码]

get idx of the last dim in prob arraies, which is greater than a limitation input: [[0.1, 0.1, 0.2, 0.5, 0.1, 0.3], [0.7, 0.6, 0.1, 0.1, 0.1, 0.1]]

0.4

output: [[3], [0, 1]]

get_span(start_ids, end_ids, with_prob=False)[源代码]

every id can only be used once get span set from position start and end list input: [1, 2, 10] [4, 12] output: set((2, 4), (10, 12))

get_id_and_prob(span_set, offset_mapping)[源代码]

Return text id and probability of predicted spans

参数
  • span_set (set) -- set of predicted spans.

  • offset_mapping (list[int]) -- list of pair preserving the index of start and end char in original text pair (prompt + text) for each token.

返回

index of start and end char in original text. prob (list[float]): probabilities of predicted spans.

返回类型

sentence_id (list[tuple])

class WordTagRelationExtractor(schema)[源代码]

基类:object

Implement of information extractor.

classmethod from_dict(config_dict)[源代码]

Make an instance from a configuration dictionary.

参数

config_dict (Dict[str, Any]) -- configuration dict.

classmethod from_json(json_str)[源代码]

Implement an instance from JSON str.

classmethod from_pkl(pkl_path)[源代码]

Implement an instance from a serialized pickle package.

classmethod from_config(config_path)[源代码]

Implement an instance from a configuration file.

add_schema_from_dict(config_dict)[源代码]

Add the schema from the dict.

extract_spo(all_items)[源代码]

Pipeline of mining procedure.

参数

all_items ([type]) -- [description]