Hamming Distance

Hamming distance is a metric (distance function) derived by counting the number of features in which two artifacts A and B differ. The feature parameter Metric allows users to specify the distance function to be used in computing each feature's contribution to the distance between two artifacts A and B. The metric can be set in the Features table. Hamming distance specifies that the distance between two artifacts A and B is the number of dimensions (features) in which the artifacts differ.

In general, if we have n features and two artifacts A and B whose feature values are (a1 , a2 , a3 , ... , an ) and (b1 , b2 , b3 , ... , bn ) respectively, if we define the delta function:

then we have:

The feature parameter Metric can be used to specify that a feature should use the Hamming distance function in computing distances between artifacts. The alternatives to Hamming distance are Euclidean distance and Manhattan distance. If you choose to use different metrics for different features of your data, particularly if any feature uses Hamming distance, it may be advisable to normalize your data, particularly for features using Euclidean or Manhattan distance.