squad#
- compute_prediction(examples, features, predictions, version_2_with_negative=False, n_best_size=20, max_answer_length=30, null_score_diff_threshold=0.0)[source]#
Post-processes the predictions of a question-answering model to convert them to answers that are substrings of the original contexts. This is the base postprocessing functions for models that only return start and end logits.
- Parameters:
examples (list) – List of raw squad-style data (see run_squad.py for more information).
features (list) – List of processed squad-style features (see run_squad.py for more information).
predictions (tuple) – The predictions of the model. Should be a tuple of two list containing the start logits and the end logits.
version_2_with_negative (bool, optional) – Whether the dataset contains examples with no answers. Defaults to False.
n_best_size (int, optional) – The total number of candidate predictions to generate. Defaults to 20.
max_answer_length (int, optional) – The maximum length of predicted answer. Defaults to 20.
null_score_diff_threshold (float, optional) – The threshold used to select the null answer. Only useful when
version_2_with_negative
is True. Defaults to 0.0.
- Returns:
A tuple of three dictionaries containing final selected answer, all n_best answers along with their probability and scores, and the score_diff of each example.
- squad_evaluate(examples, preds, na_probs=None, na_prob_thresh=1.0, is_whitespace_splited=True)[source]#
Computes and prints the f1 score and em score of input prediction. :param examples: List of raw squad-style data (see `run_squad.py
<PaddlePaddle/PaddleNLP machine_reading_comprehension/SQuAD/run_squad.py>`__ for more information).
- Parameters:
preds (dict) – Dictionary of final predictions. Usually generated by
compute_prediction
.na_probs (dict, optional) – Dictionary of score_diffs of each example. Used to decide if answer exits and compute best score_diff threshold of null. Defaults to None.
na_prob_thresh (float, optional) – The threshold used to select the null answer. Defaults to 1.0.
is_whitespace_splited (bool, optional) – Whether the predictions and references can be tokenized by whitespace. Usually set True for English and False for Chinese. Defaults to True.