Manhattan Distance

Manhattan distance is the "city block" metric (distance function) we usually use in computing distance traveling along streets and avenues between two points A and B in downtown Manhattan. The feature parameter Metric allows users to specify the distance function to be used in computing each feature's contribution to the distance between two artifacts A and B. The metric can be set in the Features table. Manhattan distance specifies that the distance between two artifacts A and B is the sum of the absolute values of the differences in the separate dimensions (features).

For example, with two features X and Y, the distance d(A, B) from artifact A to artifact B would be given by the formula:

where |X| is the difference in the two artifacts' values for one feature, and |Y| is the difference in the values for other feature. In general, if we have n features and two artifacts A and B whose feature values are (a₁ , a₂ , a₃ , ... , a_n ) and (b₁ , b₂ , b₃ , ... , b_n ) respectively:

The feature parameter Metric can be used to specify that a feature should use the Manhattan distance function in computing distances between artifacts. The alternatives to Manhattan distance are Euclidean distance and Hamming distance. If you choose to use different metrics for different features of your data, particularly if any feature uses Hamming distance, it may be advisable to normalize your data, particularly for features using Euclidean or Manhattan distance.

One difference between Euclidean distance and Manhattan distance is that Euclidean distance penalizes large distances disproportionately more than small distances. Using Euclidean distance, the distance between two artifacts which differ by one unit in each of two features (the square root of two) is less than the distance between two artifacts which differ by two units in only one feature (two); whereas they would both be equal (two) using Manhattan distance.