distinct#
- class Distinct(n_size=2, trans_func=None, name='distinct')[源代码]#
基类:
Metric
Distinct
is an algorithm for evaluating the textual diversity of the generated text by calculating the number of distinct n-grams. The larger the number of distinct n-grams, the higher the diversity of the text. See details at https://arxiv.org/abs/1510.03055.Distinct
could be used as apaddle.metric.Metric
class, or an ordinary class. WhenDistinct
is used as apaddle.metric.Metric
class, a function is needed to transform the network output to a string list.- 参数:
n_size (int, optional) -- Number of gram for
Distinct
metric. Defaults to 2.trans_func (callable, optional) --
trans_func
transforms the network output to a string list. Defaults to None.备注
When
Distinct
is used as apaddle.metric.Metric
class,trans_func
must be provided. Please note that the input oftrans_func
is numpy array.name (str, optional) -- Name of
paddle.metric.Metric
instance. Defaults to "distinct".
示例
Using as a general evaluation object.
from paddlenlp.metrics import Distinct distinct = Distinct() cand = ["The","cat","The","cat","on","the","mat"] #update the states distinct.add_inst(cand) print(distinct.score()) # 0.8333333333333334
Using as an instance of
paddle.metric.Metric
.
import numpy as np from functools import partial import paddle from paddlenlp.transformers import BertTokenizer from paddlenlp.metrics import Distinct def trans_func(logits, tokenizer): '''Transform the network output `logits` to string list.''' # [batch_size, seq_len] token_ids = np.argmax(logits, axis=-1).tolist() cand_list = [] for ids in token_ids: tokens = tokenizer.convert_ids_to_tokens(ids) strings = tokenizer.convert_tokens_to_string(tokens) cand_list.append(strings.split()) return cand_list paddle.seed(2021) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') distinct = Distinct(trans_func=partial(trans_func, tokenizer=tokenizer)) batch_size, seq_len, vocab_size = 4, 16, tokenizer.vocab_size logits = paddle.rand([batch_size, seq_len, vocab_size]) distinct.update(logits.numpy()) print(distinct.accumulate()) # 1.0
- update(output, *args)[源代码]#
Updates the metrics states. This method firstly will use
trans_func()
method to process theoutput
to get the tokenized candidate sentence list. Then calladd_inst()
method to process the candidate list one by one.- 参数:
output (numpy.ndarray|Tensor) -- The outputs of model.
args (tuple) -- The additional inputs.
- add_inst(cand)[源代码]#
Updates the states based on the candidate.
- 参数:
cand (list) -- Tokenized candidate sentence generated by model.
- score()[源代码]#
The function is the same as
accumulate()
method.- 返回:
The final distinct score.
- 返回类型:
float