Supported Similarity Functions¶
-
py_entitymatching.
affine
(s1, s2)[source]¶ This function computes the affine measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The affine measure if both the strings are not missing (i.e NaN or None), else returns NaN.
-
py_entitymatching.
hamming_dist
(s1, s2)[source]¶ This function computes the Hamming distance between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Hamming distance if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
hamming_sim
(s1, s2)[source]¶ This function computes the Hamming similarity between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Hamming similarity if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
lev_dist
(s1, s2)[source]¶ This function computes the Levenshtein distance between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Levenshtein distance if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
lev_sim
(s1, s2)[source]¶ This function computes the Levenshtein similarity between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Levenshtein similarity if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
jaro
(s1, s2)[source]¶ This function computes the Jaro measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Jaro measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
jaro_winkler
(s1, s2)[source]¶ This function computes the Jaro Winkler measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Jaro Winkler measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
needleman_wunsch
(s1, s2)[source]¶ This function computes the Needleman-Wunsch measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Needleman-Wunsch measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
smith_waterman
(s1, s2)[source]¶ This function computes the Smith-Waterman measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Smith-Waterman measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.
jaccard
(arr1, arr2)[source]¶ This function computes the Jaccard measure between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Jaccard measure should be computed. Returns: The Jaccard measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.
cosine
(arr1, arr2)[source]¶ This function computes the cosine measure between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the cosine measure should be computed. Returns: The cosine measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.
overlap_coeff
(arr1, arr2)[source]¶ This function computes the overlap coefficient between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input lists or sets for which the overlap coefficient should be computed. Returns: The overlap coefficient if both the lists/sets are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.
dice
(arr1, arr2)[source]¶ This function computes the Dice score between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Dice score should be computed. Returns: The Dice score if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.
monge_elkan
(arr1, arr2)[source]¶ This function computes the Monge-Elkan measure between the two input lists/sets. Specifically, this function uses Jaro-Winkler measure as the secondary function to compute the similarity score.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Monge-Elkan measure should be computed. Returns: The Monge-Elkan measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.
exact_match
(d1, d2)[source]¶ This function check if two objects are match exactly. Typically the objects are string, boolean and ints.
Parameters: d1,d2 (str, boolean, int) – The input objects which should checked whether they match exactly. Returns: A value of 1 is returned if they match exactly, else returns 0. Further if one of the objects is NaN or None, it returns NaN.
-
py_entitymatching.
rel_diff
(d1, d2)[source]¶ This function computes the relative difference between two numbers
Parameters: d1,d2 (float) – The input numbers for which the relative difference must be computed. Returns: A float value of relative difference between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.
-
py_entitymatching.
abs_norm
(d1, d2)[source]¶ This function computes the absolute norm similarity between two numbers
Parameters: d1,d2 (float) – Input numbers for which the absolute norm must be computed. Returns: A float value of absolute norm between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.