Supported Similarity Functions

py_entitymatching.affine(s1, s2)[source]

This function computes the affine measure between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The affine measure if both the strings are not missing (i.e NaN or None), else returns NaN.
py_entitymatching.hamming_dist(s1, s2)[source]

This function computes the Hamming distance between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Hamming distance if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.hamming_sim(s1, s2)[source]

This function computes the Hamming similarity between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Hamming similarity if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.lev_dist(s1, s2)[source]

This function computes the Levenshtein distance between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Levenshtein distance if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.lev_sim(s1, s2)[source]

This function computes the Levenshtein similarity between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Levenshtein similarity if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.jaro(s1, s2)[source]

This function computes the Jaro measure between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Jaro measure if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.jaro_winkler(s1, s2)[source]

This function computes the Jaro Winkler measure between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Jaro Winkler measure if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.needleman_wunsch(s1, s2)[source]

This function computes the Needleman-Wunsch measure between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Needleman-Wunsch measure if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.smith_waterman(s1, s2)[source]

This function computes the Smith-Waterman measure between the two input strings.

Parameters:s1,s2 (string) – The input strings for which the similarity measure should be computed.
Returns:The Smith-Waterman measure if both the strings are not missing (i.e NaN), else returns NaN.
py_entitymatching.jaccard(arr1, arr2)[source]

This function computes the Jaccard measure between the two input lists/sets.

Parameters:arr1,arr2 (list or set) – The input list or sets for which the Jaccard measure should be computed.
Returns:The Jaccard measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
py_entitymatching.cosine(arr1, arr2)[source]

This function computes the cosine measure between the two input lists/sets.

Parameters:arr1,arr2 (list or set) – The input list or sets for which the cosine measure should be computed.
Returns:The cosine measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
py_entitymatching.overlap_coeff(arr1, arr2)[source]

This function computes the overlap coefficient between the two input lists/sets.

Parameters:arr1,arr2 (list or set) – The input lists or sets for which the overlap coefficient should be computed.
Returns:The overlap coefficient if both the lists/sets are not None and do not have any missing tokens (i.e NaN), else returns NaN.
py_entitymatching.dice(arr1, arr2)[source]

This function computes the Dice score between the two input lists/sets.

Parameters:arr1,arr2 (list or set) – The input list or sets for which the Dice score should be computed.
Returns:The Dice score if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
py_entitymatching.monge_elkan(arr1, arr2)[source]

This function computes the Monge-Elkan measure between the two input lists/sets. Specifically, this function uses Jaro-Winkler measure as the secondary function to compute the similarity score.

Parameters:arr1,arr2 (list or set) – The input list or sets for which the Monge-Elkan measure should be computed.
Returns:The Monge-Elkan measure if both the lists/set are not None and do not have any missing tokens (i.e NaN), else returns NaN.
py_entitymatching.exact_match(d1, d2)[source]

This function check if two objects are match exactly. Typically the objects are string, boolean and ints.

Parameters:d1,d2 (str, boolean, int) – The input objects which should checked whether they match exactly.
Returns:A value of 1 is returned if they match exactly, else returns 0. Further if one of the objects is NaN or None, it returns NaN.
py_entitymatching.rel_diff(d1, d2)[source]

This function computes the relative difference between two numbers

Parameters:d1,d2 (float) – The input numbers for which the relative difference must be computed.
Returns:A float value of relative difference between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.
py_entitymatching.abs_norm(d1, d2)[source]

This function computes the absolute norm similarity between two numbers

Parameters:d1,d2 (float) – Input numbers for which the absolute norm must be computed.
Returns:A float value of absolute norm between the input numbers (if they are valid). Further if one of the input objects is NaN or None, it returns NaN.