Labeling¶
-
py_entitymatching.
label_table
(table, label_column_name, verbose=False)[source]¶ Label a pandas DataFrame (for supervised learning purposes).
This functions labels a DataFrame, typically used for supervised learning purposes. This function expects the input DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable). This function creates a copy of the input DataFrame, adds label column at the end of the DataFrame, fills the column values with 0, invokes a GUI for the user to enter labels (0/1, 0: non-match, 1: match) and finally returns the labeled DataFrame. Further, this function also copies the properties from the input DataFrame to the output DataFrame.
Parameters: - table (DataFrame) – The input DataFrame to be labeled. Specifically, a DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable) in the catalog.
- label_column_name (string) – The column name to be given for the labels entered by the user.
- verbose (boolean) – A flag to indicate whether more detailed information about the execution steps should be printed out (default value is False).
Returns: A new DataFrame with the labels entered by the user. Further, this function sets the output DataFrame’s properties same as input DataFrame.
Raises: AssertionError
– If table is not of type pandas DataFrame.AssertionError
– If label_column_name is not of type string.AssertionError
– If the label_column_name is already present in the input table.