distinct#
- class Distinct(n_size=2, trans_func=None, name='distinct')[source]#
Bases:
Metric
Distinct
is an algorithm for evaluating the textual diversity of the generated text by calculating the number of distinct n-grams. The larger the number of distinct n-grams, the higher the diversity of the text. See details at https://arxiv.org/abs/1510.03055.Distinct
could be used as apaddle.metric.Metric
class, or an ordinary class. WhenDistinct
is used as apaddle.metric.Metric
class, a function is needed to transform the network output to a string list.- Parameters:
n_size (int, optional) – Number of gram for
Distinct
metric. Defaults to 2.trans_func (callable, optional) –
trans_func
transforms the network output to a string list. Defaults to None.Note
When
Distinct
is used as apaddle.metric.Metric
class,trans_func
must be provided. Please note that the input oftrans_func
is numpy array.name (str, optional) – Name of
paddle.metric.Metric
instance. Defaults to “distinct”.
Examples
Using as a general evaluation object.
from paddlenlp.metrics import Distinct distinct = Distinct() cand = ["The","cat","The","cat","on","the","mat"] #update the states distinct.add_inst(cand) print(distinct.score()) # 0.8333333333333334
Using as an instance of
paddle.metric.Metric
.
import numpy as np from functools import partial import paddle from paddlenlp.transformers import BertTokenizer from paddlenlp.metrics import Distinct def trans_func(logits, tokenizer): '''Transform the network output `logits` to string list.''' # [batch_size, seq_len] token_ids = np.argmax(logits, axis=-1).tolist() cand_list = [] for ids in token_ids: tokens = tokenizer.convert_ids_to_tokens(ids) strings = tokenizer.convert_tokens_to_string(tokens) cand_list.append(strings.split()) return cand_list paddle.seed(2021) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') distinct = Distinct(trans_func=partial(trans_func, tokenizer=tokenizer)) batch_size, seq_len, vocab_size = 4, 16, tokenizer.vocab_size logits = paddle.rand([batch_size, seq_len, vocab_size]) distinct.update(logits.numpy()) print(distinct.accumulate()) # 1.0
- update(output, *args)[source]#
Updates the metrics states. This method firstly will use
trans_func()
method to process theoutput
to get the tokenized candidate sentence list. Then calladd_inst()
method to process the candidate list one by one.- Parameters:
output (numpy.ndarray|Tensor) – The outputs of model.
args (tuple) – The additional inputs.
- add_inst(cand)[source]#
Updates the states based on the candidate.
- Parameters:
cand (list) – Tokenized candidate sentence generated by model.
- accumulate()[source]#
Calculates the final distinct score.
- Returns:
The final distinct score.
- Return type:
float
- score()[source]#
The function is the same as
accumulate()
method.- Returns:
The final distinct score.
- Return type:
float