bleu#

class BLEU(trans_func=None, vocab=None, n_size=4, weights=None, name='bleu')[source]#

Bases: Metric

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. This metric uses a modified form of precision to compare a candidate translation against multiple reference translations.

BLEU could be used as paddle.metric.Metric class, or an ordinary class. When BLEU is used as paddle.metric.Metric class. A function is needed that transforms the network output to reference string list, and transforms the label to candidate string. By default, a default function default_trans_func is provided, which gets target sequence id by calculating the maximum probability of each step. In this case, user must provide vocab. It should be noted that the BLEU here is different from the BLEU calculated in prediction, and it is only for observation during training and evaluation.

\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]

where c is the length of candidate sentence, and r is the length of reference sentence.

Parameters:

trans_func (callable, optional) – trans_func transforms the network output to string to calculate.
vocab (dict|paddlenlp.data.vocab, optional) – Vocab for target language. If trans_func is None and BLEU is used as paddle.metric.Metric instance, default_trans_func will be performed and vocab must be provided.
n_size (int, optional) – Number of gram for BLEU metric. Defaults to 4.
weights (list, optional) – The weights of precision of each gram. Defaults to None.
name (str, optional) – Name of paddle.metric.Metric instance. Defaults to “bleu”.

Examples

Using as a general evaluation object.

from paddlenlp.metrics import BLEU
bleu = BLEU()
cand = ["The","cat","The","cat","on","the","mat"]
ref_list = [["The","cat","is","on","the","mat"], ["There","is","a","cat","on","the","mat"]]
bleu.add_inst(cand, ref_list)
print(bleu.score()) # 0.4671379777282001

Using as an instance of paddle.metric.Metric.

# You could add the code below to Seq2Seq example in this repo to
# use BLEU as `paddlenlp.metric.Metric' class. If you run the
# following code alone, you may get an error.
# log example:
# Epoch 1/12
# step 100/507 - loss: 308.7948 - Perplexity: 541.5600 - bleu: 2.2089e-79 - 923ms/step
# step 200/507 - loss: 264.2914 - Perplexity: 334.5099 - bleu: 0.0093 - 865ms/step
# step 300/507 - loss: 236.3913 - Perplexity: 213.2553 - bleu: 0.0244 - 849ms/step

from paddlenlp.data import Vocab
from paddlenlp.metrics import BLEU

bleu_metric = BLEU(vocab=src_vocab.idx_to_token)
model.prepare(optimizer, CrossEntropyCriterion(), [ppl_metric, bleu_metric])

update(output, label, seq_mask=None)[source]#

Update states for metric

Inputs of update is the outputs of Metric.compute, if compute is not defined, the inputs of update will be flatten arguments of output of mode and label from data: update(output1, output2, ..., label1, label2,...)

see Metric.compute

add_inst(cand, ref_list)[source]#

Update the states based on a pair of candidate and references.

Parameters:

cand (list) – Tokenized candidate sentence.
ref_list (list of list) – List of tokenized ground truth sentences.

reset()[source]#: Reset states and result

accumulate()[source]#

Calculates and returns the final bleu metric.

Returns:: Returns the accumulated metric bleu and its data type is float64.
Return type:: Tensor

name()[source]#: Returns metric name

class BLEUForDuReader(n_size=4, alpha=1.0, beta=1.0)[source]#

Bases: BLEU

BLEU metric with bonus for DuReader contest.

Please refer to `DuReader Homepage<https://ai.baidu.com//broad/subordinate?dataset=dureader>`_ for more details.

Parameters:

n_size (int, optional) – Number of gram for BLEU metric. Defaults to 4.
alpha (float, optional) – Weight of YesNo dataset when adding bonus for DuReader contest. Defaults to 1.0.
beta (float, optional) – Weight of Entity dataset when adding bonus for DuReader contest. Defaults to 1.0.

add_inst(cand, ref_list, yn_label=None, yn_ref=None, entity_ref=None)[source]#

Update the states based on a pair of candidate and references.

Parameters:

cand (list) – Tokenized candidate sentence.
ref_list (list of list) – List of tokenized ground truth sentences.

bleu

Contents

bleu#