Supported Similarity Functions¶
-
py_entitymatching.affine(s1, s2)[source]¶ This function computes the affine measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The affine measure if both the strings are not missing (i.e NaN or None), else returns NaN.
-
py_entitymatching.hamming_dist(s1, s2)[source]¶ This function computes the Hamming distance between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Hamming distance if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.hamming_sim(s1, s2)[source]¶ This function computes the Hamming similarity between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Hamming similarity if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.lev_dist(s1, s2)[source]¶ This function computes the Levenshtein distance between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Levenshtein distance if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.lev_sim(s1, s2)[source]¶ This function computes the Levenshtein similarity between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Levenshtein similarity if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.jaro(s1, s2)[source]¶ This function computes the Jaro measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Jaro measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.jaro_winkler(s1, s2)[source]¶ This function computes the Jaro Winkler measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Jaro Winkler measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.needleman_wunsch(s1, s2)[source]¶ This function computes the Needleman-Wunsch measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Needleman-Wunsch measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.smith_waterman(s1, s2)[source]¶ This function computes the Smith-Waterman measure between the two input strings.
Parameters: s1,s2 (string) – The input strings for which the similarity measure should be computed. Returns: The Smith-Waterman measure if both the strings are not missing (i.e NaN), else returns NaN.
-
py_entitymatching.jaccard(arr1, arr2)[source]¶ This function computes the Jaccard measure between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Jaccard measure should be computed. Returns: The Jaccard measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.cosine(arr1, arr2)[source]¶ This function computes the cosine measure between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the cosine measure should be computed. Returns: The cosine measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.overlap_coeff(arr1, arr2)[source]¶ This function computes the overlap coefficient between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input lists or sets for which the overlap coefficient should be computed. Returns: The overlap coefficient if both the lists/sets are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.dice(arr1, arr2)[source]¶ This function computes the Dice score between the two input lists/sets.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Dice score should be computed. Returns: The Dice score if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.monge_elkan(arr1, arr2)[source]¶ This function computes the Monge-Elkan measure between the two input lists/sets. Specifically, this function uses Jaro-Winkler measure as the secondary function to compute the similarity score.
Parameters: arr1,arr2 (list or set) – The input list or sets for which the Monge-Elkan measure should be computed. Returns: The Monge-Elkan measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
-
py_entitymatching.exact_match(d1, d2)[source]¶ This function check if two objects are match exactly. Typically the objects are string, boolean and ints.
Parameters: d1,d2 (str, boolean, int) – The input objects which should checked whether they match exactly. Returns: A value of 1 is returned if they match exactly, else returns 0. Further if one of the objects is NaN or None, it returns NaN.
-
py_entitymatching.rel_diff(d1, d2)[source]¶ This function computes the relative difference between two numbers
Parameters: d1,d2 (float) – The input numbers for which the relative difference must be computed. Returns: A float value of relative difference between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.
-
py_entitymatching.abs_norm(d1, d2)[source]¶ This function computes the absolute norm similarity between two numbers
Parameters: d1,d2 (float) – Input numbers for which the absolute norm must be computed. Returns: A float value of absolute norm between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.