compute_prediction(examples, features, predictions, version_2_with_negative=False, n_best_size=20, max_answer_length=30, null_score_diff_threshold=0.0)[source]

Post-processes the predictions of a question-answering model to convert them to answers that are substrings of the original contexts. This is the base postprocessing functions for models that only return start and end logits.

  • examples (list) – List of raw squad-style data (see run_squad.py for more information).

  • features (list) – List of processed squad-style features (see run_squad.py for more information).

  • predictions (tuple) – The predictions of the model. Should be a tuple of two list containing the start logits and the end logits.

  • version_2_with_negative (bool, optional) – Whether the dataset contains examples with no answers. Defaults to False.

  • n_best_size (int, optional) – The total number of candidate predictions to generate. Defaults to 20.

  • max_answer_length (int, optional) – The maximum length of predicted answer. Defaults to 20.

  • null_score_diff_threshold (float, optional) – The threshold used to select the null answer. Only useful when version_2_with_negative is True. Defaults to 0.0.


A tuple of three dictionaries containing final selected answer, all n_best answers along with their probability and scores, and the score_diff of each example.

squad_evaluate(examples, preds, na_probs=None, na_prob_thresh=1.0, is_whitespace_splited=True)[source]

Computes and prints the f1 score and em score of input prediction. :param examples: List of raw squad-style data (see `run_squad.py

<https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/ machine_reading_comprehension/SQuAD/run_squad.py>`__ for more information).

  • preds (dict) – Dictionary of final predictions. Usually generated by compute_prediction.

  • na_probs (dict, optional) – Dictionary of score_diffs of each example. Used to decide if answer exits and compute best score_diff threshold of null. Defaults to None.

  • na_prob_thresh (float, optional) – The threshold used to select the null answer. Defaults to 1.0.

  • is_whitespace_splited (bool, optional) – Whether the predictions and references can be tokenized by whitespace. Usually set True for English and False for Chinese. Defaults to True.