Debugging Matcher¶
-
py_entitymatching.
vis_debug_dt
(matcher, train, test, exclude_attrs, target_attr)[source]¶ Visual debugger for Decision Tree matcher.
Parameters: - matcher (DTMatcher) – The Decision tree matcher that should be debugged.
- train (DataFrame) – The pandas DataFrame that will be used to train the matcher.
- test (DataFrame) – The pandas DataFrame that will be used to test the matcher.
- exclude_attrs (list) – The list of attributes to be excluded from train and test, for training and testing.
- target_attr (string) – The attribute name in the ‘train’ containing the true labels.
-
py_entitymatching.
vis_debug_rf
(matcher, train, test, exclude_attrs, target_attr)[source]¶ Visual debugger for Random Forest matcher.
Parameters: - matcher (RFMatcher) – The Random Forest matcher that should be debugged.
- train (DataFrame) – The pandas DataFrame that will be used to train the matcher.
- test (DataFrame) – The pandas DataFrame that will be used to test the matcher.
- exclude_attrs (list) – The list of attributes to be excluded from train and test, for training and testing.
- target_attr (string) – The attribute name in the ‘train’ containing the true labels.
-
py_entitymatching.
debug_decisiontree_matcher
(decision_tree, tuple_1, tuple_2, feature_table, table_columns, exclude_attrs=None)[source]¶ This function is used to debug a decision tree matcher using two input tuples.
Specifically, this function takes in two tuples, gets the feature vector using the feature table and finally passes it to the decision tree and displays the path that the feature vector takes in the decision tree.
Parameters: - decision_tree (DTMatcher) – The input decision tree object that should be debugged.
- tuple_1,tuple_2 (Series) – Input tuples that should be debugged.
- feature_table (DataFrame) – Feature table containing the functions for the features.
- table_columns (list) – List of all columns that will be outputted after generation of feature vectors.
- exclude_attrs (list) – List of attributes that should be removed from the table columns.
Raises: AssertionError
– If the input feature table is not of type pandas DataFrame.
-
py_entitymatching.
debug_randomforest_matcher
(random_forest, tuple_1, tuple_2, feature_table, table_columns, exclude_attrs=None)[source]¶ This function is used to debug a random forest matcher using two input tuples.
Specifically, this function takes in two tuples, gets the feature vector using the feature table and finally passes it to the random forest and displays the path that the feature vector takes in each of the decision trees that make up the random forest matcher.
Parameters: - random_forest (RFMatcher) – The input random forest object that should be debugged.
- tuple_1,tuple_2 (Series) – Input tuples that should be debugged.
- feature_table (DataFrame) – Feature table containing the functions for the features.
- table_columns (list) – List of all columns that will be outputted after generation of feature vectors.
- exclude_attrs (list) – List of attributes that should be removed from the table columns.
Raises: AssertionError
– If the input feature table is not of type pandas DataFrame.