class BLEU(trans_func=None, vocab=None, n_size=4, weights=None, name='bleu')[源代码]


BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. This metric uses a modified form of precision to compare a candidate translation against multiple reference translations.

BLEU could be used as paddle.metric.Metric class, or an ordinary class. When BLEU is used as paddle.metric.Metric class. A function is needed that transforms the network output to reference string list, and transforms the label to candidate string. By default, a default function default_trans_func is provided, which gets target sequence id by calculating the maximum probability of each step. In this case, user must provide vocab. It should be noted that the BLEU here is different from the BLEU calculated in prediction, and it is only for observation during training and evaluation.

\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]

where c is the length of candidate sentence, and 'r' is the length of refrence sentence.

  • trans_func (callable, optional) -- trans_func transforms the network output to string to calculate.

  • vocab (dict|, optional) -- Vocab for target language. If trans_func is None and BLEU is used as paddle.metric.Metric instance, default_trans_func will be performed and vocab must be provided.

  • n_size (int, optional) -- Number of gram for BLEU metric. Default: 4.

  • weights (list, optional) -- The weights of precision of each gram. Default: None.

  • name (str, optional) -- Name of paddle.metric.Metric instance. Default: "bleu".


  1. Using as a general evaluation object.

from paddlenlp.metrics import BLEU
bleu = BLEU()
cand = ["The","cat","The","cat","on","the","mat"]
ref_list = [["The","cat","is","on","the","mat"], ["There","is","a","cat","on","the","mat"]]
bleu.add_inst(cand, ref_list)
print(bleu.score()) # 0.4671379777282001
  1. Using as an instance of paddle.metric.Metric.

# You could add the code below to Seq2Seq example in this repo to
# use BLEU as `paddlenlp.metric.Metric' class. If you run the
# following code alone, you may get an error.
# log example:
# Epoch 1/12
# step 100/507 - loss: 308.7948 - Perplexity: 541.5600 - bleu: 2.2089e-79 - 923ms/step
# step 200/507 - loss: 264.2914 - Perplexity: 334.5099 - bleu: 0.0093 - 865ms/step
# step 300/507 - loss: 236.3913 - Perplexity: 213.2553 - bleu: 0.0244 - 849ms/step

from import Vocab
from paddlenlp.metrics import BLEU

bleu_metric = BLEU(vocab=src_vocab.idx_to_token)
model.prepare(optimizer, CrossEntropyCriterion(), [ppl_metric, bleu_metric])
update(output, label, seq_mask=None)[源代码]

Update states for metric

Inputs of update is the outputs of Metric.compute, if compute is not defined, the inputs of update will be flatten arguments of output of mode and label from data: update(output1, output2, ..., label1, label2,...)

see Metric.compute

add_inst(cand, ref_list)[源代码]

Update the states based on a pair of candidate and references.

  • cand (list) -- Tokenized candidate sentence.

  • ref_list (list of list) -- List of tokenized ground truth sentences.


Reset states and result


Calculate the final bleu metric.


Returns metric name

class BLEUForDuReader(n_size=4, alpha=1.0, beta=1.0)[源代码]


BLEU metric with bonus for DuReader contest.

Please refer to `DuReader Homepage<>`_ for more details.

add_inst(cand, ref_list, yn_label=None, yn_ref=None, entity_ref=None)[源代码]

Update the states based on a pair of candidate and references.

  • cand (list) -- Tokenized candidate sentence.

  • ref_list (list of list) -- List of tokenized ground truth sentences.