PaddleNLP Metrics API

PaddleNLP Metrics API#

Currently PaddleNLP provides the following model evaluation metrics:

Metric Description API
Perplexity Perplexity, commonly used to evaluate language models, also applicable to machine translation and text generation tasks. paddlenlp.metrics.Perplexity
BLEU(BiLingual Evaluation Understudy) Common machine translation evaluation metric paddlenlp.metrics.BLEU
Rouge(Recall-Oriented Understudy for Gisting Evaluation) Evaluation metrics for automatic summarization and machine translation paddlenlp.metrics.RougeL, paddlenlp.metrics.RougeN
AccuracyAndF1 Accuracy and F1-score, applicable to MRPC and QQP tasks in GLUE paddlenlp.metrics.AccuracyAndF1
PearsonAndSpearman Pearson correlation coefficient and Spearman's rank correlation coefficient. Applicable to STS-B task in GLUE paddlenlp.metrics.PearsonAndSpearman
Mcc(Matthews correlation coefficient) Matthews correlation coefficient, measuring binary classification performance. Applicable to CoLA task in GLUE paddlenlp.metrics.Mcc
ChunkEvaluator Computes precision, recall and F1-score for chunk detection. Commonly used in sequence labeling tasks like Named Entity Recognition (NER) paddlenlp.metrics.ChunkEvaluator
Squad Evaluation Evaluation metrics for SQuAD and DuReader-robust paddlenlp.metrics.compute_predictions, paddlenlp.metrics.squad_evaluate
Distinct Diversity metric commonly used to measure the formal diversity of sentences generated by text generation models. paddlenlp.metrics.Distinct