bleu#
- class BLEU(trans_func=None, vocab=None, n_size=4, weights=None, name='bleu')[source]#
Bases:
Metric
BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. This metric uses a modified form of precision to compare a candidate translation against multiple reference translations.
BLEU could be used as
paddle.metric.Metric
class, or an ordinary class. When BLEU is used aspaddle.metric.Metric
class. A function is needed that transforms the network output to reference string list, and transforms the label to candidate string. By default, a default functiondefault_trans_func
is provided, which gets target sequence id by calculating the maximum probability of each step. In this case, user must providevocab
. It should be noted that the BLEU here is different from the BLEU calculated in prediction, and it is only for observation during training and evaluation.\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]where
c
is the length of candidate sentence, andr
is the length of reference sentence.- Parameters:
trans_func (callable, optional) –
trans_func
transforms the network output to string to calculate.vocab (dict|paddlenlp.data.vocab, optional) – Vocab for target language. If
trans_func
is None and BLEU is used aspaddle.metric.Metric
instance,default_trans_func
will be performed andvocab
must be provided.n_size (int, optional) – Number of gram for BLEU metric. Defaults to 4.
weights (list, optional) – The weights of precision of each gram. Defaults to None.
name (str, optional) – Name of
paddle.metric.Metric
instance. Defaults to “bleu”.
Examples
Using as a general evaluation object.
from paddlenlp.metrics import BLEU bleu = BLEU() cand = ["The","cat","The","cat","on","the","mat"] ref_list = [["The","cat","is","on","the","mat"], ["There","is","a","cat","on","the","mat"]] bleu.add_inst(cand, ref_list) print(bleu.score()) # 0.4671379777282001
Using as an instance of
paddle.metric.Metric
.
# You could add the code below to Seq2Seq example in this repo to # use BLEU as `paddlenlp.metric.Metric' class. If you run the # following code alone, you may get an error. # log example: # Epoch 1/12 # step 100/507 - loss: 308.7948 - Perplexity: 541.5600 - bleu: 2.2089e-79 - 923ms/step # step 200/507 - loss: 264.2914 - Perplexity: 334.5099 - bleu: 0.0093 - 865ms/step # step 300/507 - loss: 236.3913 - Perplexity: 213.2553 - bleu: 0.0244 - 849ms/step from paddlenlp.data import Vocab from paddlenlp.metrics import BLEU bleu_metric = BLEU(vocab=src_vocab.idx_to_token) model.prepare(optimizer, CrossEntropyCriterion(), [ppl_metric, bleu_metric])
- update(output, label, seq_mask=None)[source]#
Update states for metric
Inputs of
update
is the outputs ofMetric.compute
, ifcompute
is not defined, the inputs ofupdate
will be flatten arguments of output of mode and label from data:update(output1, output2, ..., label1, label2,...)
see
Metric.compute
- add_inst(cand, ref_list)[source]#
Update the states based on a pair of candidate and references.
- Parameters:
cand (list) – Tokenized candidate sentence.
ref_list (list of list) – List of tokenized ground truth sentences.
- class BLEUForDuReader(n_size=4, alpha=1.0, beta=1.0)[source]#
Bases:
BLEU
BLEU metric with bonus for DuReader contest.
Please refer to `DuReader Homepage<https://ai.baidu.com//broad/subordinate?dataset=dureader>`_ for more details.
- Parameters:
n_size (int, optional) – Number of gram for BLEU metric. Defaults to 4.
alpha (float, optional) – Weight of YesNo dataset when adding bonus for DuReader contest. Defaults to 1.0.
beta (float, optional) – Weight of Entity dataset when adding bonus for DuReader contest. Defaults to 1.0.