Evaluating the Matching Output¶
-
py_entitymatching.
eval_matches
(data_frame, gold_label_attr, predicted_label_attr)[source]¶ Evaluates the matches from the matcher.
Specifically, given a DataFrame containing golden labels and predicted labels, this function would evaluate the matches and return the accuracy results such as precision, recall and F1.
Parameters: - data_frame (DataFrame) – The input pandas DataFrame containing “gold” labels and “predicted” labels.
- gold_label_attr (string) – An attribute in the input DataFrame containing “gold” labels.
- predicted_label_attr (string) – An attribute in the input DataFrame containing “predicted” labels.
Returns: A Python dictionary containing the accuracy measures such as precision, recall, F1.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If gold_label_attr is not of type string.AssertionError
– If predicted_label_attr is not of type string.AssertionError
– If the gold_label_attr is not in the input dataFrame.AssertionError
– If the predicted_label_attr is not in the input dataFrame.
-
py_entitymatching.
print_eval_summary
(eval_summary)[source]¶ Prints a summary of evaluation results.
Parameters: eval_summary (dictionary) – Dictionary containing evaluation results, typically from ‘eval_matches’ function.
-
py_entitymatching.
get_false_positives_as_df
(table, eval_summary, verbose=False)[source]¶ Select only the false positives from the input table and return as a DataFrame based on the evaluation results.
Parameters: - table (DataFrame) – The input table (pandas DataFrame) that was used for evaluation.
- eval_summary (dictionary) – A Python dictionary containing evaluation results, typically from ‘eval_matches’ command.
Returns: A pandas DataFrame containing only the False positives from the input table.
Further, this function sets the output DataFrame’s properties same as input DataFrame.
-
py_entitymatching.
get_false_negatives_as_df
(table, eval_summary, verbose=False)[source]¶ Select only the false negatives from the input table and return as a DataFrame based on the evaluation results.
Parameters: - table (DataFrame) – The input table (pandas DataFrame) that was used for evaluation.
- eval_summary (dictionary) – A Python dictionary containing evaluation results, typically from ‘eval_matches’ command.
Returns: A pandas DataFrame containing only the false negatives from the input table.
Further, this function sets the output DataFrame’s properties same as input DataFrame.