Handling Metadata¶
-
py_entitymatching.
get_catalog
()[source]¶ Gets the catalog information for the current session.
Returns: A Python dictionary containing the catalog information. Specifically, the dictionary contains the Python identifier of a DataFrame (obtained by id(DataFrame object)) as the key and their properties as value.
-
py_entitymatching.
get_catalog_len
()[source]¶ Get the length (i.e the number of entries) in the catalog.
Returns: The number of entries in the catalog as an integer.
-
py_entitymatching.
del_catalog
()[source]¶ Deletes the catalog for the current session.
Returns: A Boolean value of True is returned if the deletion was successful.
-
py_entitymatching.
is_catalog_empty
()[source]¶ Checks if the catalog is empty.
Returns: A Boolean value of True is returned if the catalog is empty, else returns False.
-
py_entitymatching.
is_dfinfo_present
(data_frame)[source]¶ Checks whether the DataFrame information is present in the catalog.
Parameters: data_frame (DataFrame) – The DataFrame that should be checked for its presence in the catalog. Returns: A Boolean value of True is returned if the DataFrame is present in the catalog, else False is returned. Raises: AssertionError
– If data_frame is not of type pandas DataFrame.
-
py_entitymatching.
is_property_present_for_df
(data_frame, property_name)[source]¶ Checks if the given property is present for the given DataFrame in the catalog.
Parameters: - data_frame (DataFrame) – The DataFrame for which the property must be checked for.
- property_name (string) – The name of the property that should be
- for its presence for the DataFrame, in the catalog. (checked) –
Returns: A Boolean value of True is returned if the property is present for the given DataFrame.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If property_name is not of type string.KeyError
– If data_frame is not present in the catalog.
-
py_entitymatching.
show_properties
(data_frame)[source]¶ Prints the properties for a DataFrame that is present in the catalog.
Parameters: data_frame (DataFrame) – The input pandas DataFrame for which the properties must be displayed.
-
py_entitymatching.
show_properties_for_id
(object_id)[source]¶ Shows the properties for an object id present in the catalog.
Specifically, given an object id got from typically executing id( <object>), where the object could be a DataFrame, this function will display the properties present for that object id in the catalog.
Parameters: object_id (int) – The Python identifier of an object (typically a pandas DataFrame).
-
py_entitymatching.
init_properties
(data_frame)[source]¶ Initializes properties for a pandas DataFrame in the catalog.
Specifically, this function creates an entry in the catalog and sets its properties to empty.
Parameters: data_frame (DataFrame) – DataFrame for which the properties must be initialized. Returns: A Boolean value of True is returned if the initialization was successful.
-
py_entitymatching.
get_property
(data_frame, property_name)[source]¶ Gets the value of a property (with the given property name) for a pandas DataFrame from the catalog.
Parameters: - data_frame (DataFrame) – The DataFrame for which the property should be retrieved.
- property_name (string) – The name of the property that should be retrieved.
Returns: A Python object (typically a string or a pandas DataFrame depending on the property name) is returned.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If property_name is not of type string.KeyError
– If data_frame information is not present in the catalog.KeyError
– If requested property for the data_frame is not present in the catalog.
-
py_entitymatching.
set_property
(data_frame, property_name, property_value)[source]¶ Sets the value of a property (with the given property name) for a pandas DataFrame in the catalog.
Parameters: - data_frame (DataFrame) – The DataFrame for which the property must be set.
- property_name (string) – The name of the property to be set.
- property_value (object) – The value of the property to be set. This is typically a string (such as key) or pandas DataFrame (such as ltable, rtable).
Returns: A Boolean value of True is returned if the update was successful.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If property_name is not of type string.
Note
If the input DataFrame is not present in the catalog, this function will create an entry in the catalog and set the given property.
-
py_entitymatching.
del_property
(data_frame, property_name)[source]¶ Deletes a property for a pandas DataFrame from the catalog.
Parameters: - data_frame (DataFrame) – The input DataFrame for which a property must be deleted from the catalog.
- property_name (string) – The name of the property that should be deleted.
Returns: A Boolean value of True is returned if the deletion was successful.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If property_name is not of type string.KeyError
– If data_frame information is not present in the catalog.KeyError
– If requested property for the DataFrame is not present in the catalog.
-
py_entitymatching.
copy_properties
(source_data_frame, target_data_frame, replace=True)[source]¶ Copies properties from a source DataFrame to target DataFrame in the catalog.
Parameters: - source_data_frame (DataFrame) – The DataFrame from which the properties to be copied from, in the catalog.
- target_data_frame (DataFrame) – The DataFrame to which the properties to be copied to, in the catalog.
- replace (boolean) – A flag to indicate whether the source DataFrame’s properties can replace the target DataFrame’s properties in the catalog. The default value for the flag is True. Specifically, if the target DataFrame’s information is already present in the catalog then the function will check if the replace flag is True. If the flag is set to True, then the function will first delete the existing properties and then set it with the source DataFrame properties. If the flag is False, the function will just return without modifying the existing properties.
Returns: A Boolean value of True is returned if the copying was successful.
Raises: AssertionError
– If source_data_frame is not of type pandas DataFrame.AssertionError
– If target_data_frame is not of type pandas DataFrame.KeyError
– If source DataFrame is not present in the catalog.
-
py_entitymatching.
get_key
(data_frame)[source]¶ Gets the value of ‘key’ property for a DataFrame from the catalog.
Parameters: data_frame (DataFrame) – The DataFrame for which the key must be retrieved from the catalog. Returns: A string value containing the key column name is returned (if present). See also
-
py_entitymatching.
set_key
(data_frame, key_attribute)[source]¶ Sets the value of ‘key’ property for a DataFrame in the catalog with the given attribute (i.e column name).
Specifically, this function set the the key attribute for the DataFrame if the given attribute satisfies the following two properties:
The key attribute should have unique values.
The key attribute should not have missing values. A missing value is represented as np.NaN.
Parameters: - data_frame (DataFrame) – The DataFrame for which the key must be set in the catalog.
- key_attribute (string) – The key attribute (column name) in the DataFrame.
Returns: A Boolean value of True is returned, if the given attribute satisfies the conditions for a key and the update was successful.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If key_attribute is not of type string.KeyError
– If given key_attribute is not in the DataFrame columns.
See also
-
py_entitymatching.
get_fk_ltable
(data_frame)[source]¶ Gets the foreign key to left table for a DataFrame from the catalog.
Specifically this function is a sugar function that will get the foreign key to left table using underlying
get_property()
function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.Parameters: data_frame (DataFrame) – The input DataFrame for which the foreign key ltable property must be retrieved. Returns: A Python object, typically a string is returned. See also
-
py_entitymatching.
set_fk_ltable
(data_frame, fk_ltable)[source]¶ Sets the foreign key to ltable for a DataFrame in the catalog.
Specifically this function is a sugar function that will set the foreign key to the left table using
py_entitymatching.set_property()
function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.Parameters: - data_frame (DataFrame) – The input DataFrame for which the foreign key ltable property must be set.
- fk_ltable (string) – The attribute that must ne set as the foreign key to the ltable in the catalog.
Returns: A Boolean value of True is returned if the foreign key to ltable was set successfully.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If fk_ltable is not of type string.AssertionError
– If fk_ltable is not in the input DataFrame.
See also
-
py_entitymatching.
get_fk_rtable
(data_frame)[source]¶ Gets the foreign key to right table for a DataFrame from the catalog.
Specifically this function is a sugar function that will get the foreign key to right table using
py_entitymatching.get_property()
function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.Parameters: data_frame (DataFrame) – The input DataFrame for which the foreign key rtable property must be retrieved. Returns: A Python object, (typically a string) is returned. See also
-
py_entitymatching.
set_fk_rtable
(data_frame, foreign_key_rtable)[source]¶ Sets the foreign key to rtable for a DataFrame in the catalog.
Specifically this function is a sugar function that will set the foreign key to right table using set_property function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.
Parameters: - data_frame (DataFrame) – The input DataFrame for which the foreign key rtable property must be set.
- foreign_key_rtable (string) – The attribute that must be set as foreign key to rtable in the catalog.
Returns: - A Boolean value of True is returned if the foreign key to rtable was
set successfully.
Raises: AssertionError
– If data_frame is not of type pandas DataFrame.AssertionError
– If foreign_key_rtable is not of type string.AssertionError
– If fk_rtable is not in the input DataFrame.
See also
-
py_entitymatching.
get_ltable
(candset)[source]¶ Gets the ltable for a DataFrame from the catalog.
Parameters: candset (DataFrame) – The input table for which the ltable must be returned. Returns: A pandas DataFrame that is pointed by ‘ltable’ property of the input table. See also
-
py_entitymatching.
set_ltable
(candset, table)[source]¶ Sets the ltable for a DataFrame in the catalog.
Parameters: - candset (DataFrame) – The input table for which the ltable must be set.
- table (DataFrame) – The table (typically a pandas DataFrame) that must be set as ltable for the input DataFrame.
Returns: A Boolean value of True is returned, if the update was successful.
See also
-
py_entitymatching.
get_rtable
(candset)[source]¶ Gets the rtable for a DataFrame from the catalog.
Parameters: candset (DataFrame) – Input table for which the rtable must be returned. Returns: A pandas DataFrame that is pointed by ‘rtable’ property of the input table. See also
-
py_entitymatching.
set_rtable
(candset, table)[source]¶ Sets the rtable for a DataFrame in the catalog.
Parameters: - candset (DataFrame) – The input table for which the rtable must be set.
- table (DataFrame) – The table that must be set as rtable for the input DataFrame.
Returns: A Boolean value of True is returned, if the update was successful.
See also