Blank values in the data are inherently ambiguous. Some people use blanks as a shorthand notation for values of 0, particularly in occurrence seriation where the values would normally be 0 and 1. For simplicity's sake only the 1's are entered in the data matrix. At other times blanks are supposed to indicate a feature is missing or absent for an item (for example, the feature might be the color of an ornamental band on pottery and the feature might be absent on some artifacts). Alternatively, a blank could signify the absence of data rather than the absence of the feature - we simply don't know anything about this feature for this artifact.
Zero values in the data are also inherently ambiguous. Normally, zeroes indicate values of 0. However, a zero could also indicate that a feature is missing or absent for an item Or, a zero might indicate both a value and an absence (for example, the feature might be the width of an ornamental band on pottery and a zero might indicate a band of zero width, or equivalently, the absence of a band on some artifacts). Alternatively, a zero could signify the absence of data rather than the absence of the feature - we simply don't know anything about this feature for this artifact.
Whatever conventions we are using, we need to let OptiPath know by setting Blanks and Zeroes in the Features table. Right clicking on the column brings up a popup menu that allows you to set all features with one click.
There are six available settings for Blanks and Zeroes. Right clicking on either column will bring up a popup menu that allows you to set all features with one click.
Present & Zero - a zero value (0) will be interpreted to mean the feature is present but has a value of zero. A Present & Zero data entry will be included in the computation of the transition penalty, and will be included in the computation of the distance, between the item and its preceding or succeeding item in the seriation.
Absent & Zero - a zero value (0) will be interpreted to mean the feature is absent and has a value of zero. An Absent & Zero data entry will be included in the computation of the transition penalty, and will be included in the computation of the distance, between the item and its preceding or succeeding item in the seriation.
Present & Unknown - a zero value (0) will be interpreted to mean the feature is present but its value is unknown. A Present & Unknown data entry will be included in the computation of the transition penalty, but will not be included in the computation of the distance, between the item and its preceding or succeeding item in the seriation.
Absent & Unknown - a zero value (0) will be interpreted to mean the feature is absent and its value is unknown. An Absent & Unknown data entry will be included in the computation of the transition penalty, but will not be included in the computation of the distance, between the item and its preceding or succeeding item in the seriation.
Unknown & Zero - a zero value (0) will be interpreted to mean the presence or absence of the feature is unknown but its value is presumed to be zero. An Unknown & Zero data entry will not be included in the computation of the transition penalty, but will be included in the computation of the distance, between the item and its preceding or succeeding item in the seriation.
Unknown - a zero value (0) will be interpreted to mean the presence or absence of the feature is unknown and its value is unknown. An Unknown data entry will not be included in the computation of either the transition penalty or the distance between the item and its preceding or succeeding item in the seriation.
Each setting for Blanks and Zeroes can lead to different results in seriation. For more information see Setting the Earlier, Later, Blanks, Zeroes and Transition Parameters.
Presence is Known or Unknown
There is an important difference between a feature being absent and its presence being unknown. A presence followed by an absence, or vice versa, incurs a transition penalty. However, a presence or an absence followed by an unknown, or vice versa, does not incur a transition penalty. For example, in Table 1 below, each Item's number indicates its ordinal position in the seriation; and, for Feature A, Blanks has been set to Unknown (meaning the feature is present but its value is unknown) and Zeroes has been set to Present & Zero (meaning the feature is present with a value of 0). In this case, a transition penalty would be incurred for Feature A between Item 3 and Item 4, but not between Item 1 and Item 2 nor between Item 2 and Item 3.
Table 1 | Feature A | Feature B | Feature C | Feature D | Feature E |
Item 1 | 5 | 10 | 10 | 10 | 10 |
Item 2 | 17 | 17 | 17 | 17 | |
Item 3 | 0 | 17 | 17 | 23 | 30 |
Item 4 | 7 | 30 | 39 | ||
Item 5 | 5 | 13 | 13 | 39 | 23 |
Item 6 | 21 | 4 | 4 | 0 | 0 |
Value is Zero or Unknown
There is also an important difference between a value of zero and an unknown value. If the value of a feature is unknown, OptiPath does not include it in computing distances, but if the value is zero the feature is included in the distance computation. For example, in Table 1 as described above, the distance between Item 1 and Item 2 will be different than the distance between Item 1 and Item 3.
A value of unknown also has no influence in determining unimodality. In Table 1, if Feature B has Blanks set to Present & Unknown (meaning the feature is present but its value is unknown) but Feature C has Blanks set to Present & Zero (meaning the feature is present and its value is 0), then the graph of Feature B would be considered unimodal while the graph of Feature C would not.
There is another issue concerning zeroe values. Suppose we have the integer data presented in Features D and E in Table 1 above, where Zeroes has been set to Absent & Zero (meaning the feature is absent but its value is 0). The items represent collections or assemblages of artifacts, and the numbers represent frequencies or the number of times a feature (or style) appears in an assemblage. Conventional thinking in frequency seriation is to try to achieve lenticular or "battleship-shaped" curves. This would imply that Feature B has a preferred ordering to Feature A even though both are unimodal. To encourage the ordering in B, it is advisable to treat 0's as values (or values and absences). This way OptiPath will try to taper off a string of values gradually at both start and finish.