Import Data from Text File

A tab delimited text file is a common text file format for transferring information between databases. As its name implies, fields are separated by tabs. Text files may be viewed or edited in any text editor, although we don't recommend editing. Tab delimited text files are readily imported by spreadsheet programs (like Microsoft Excel), database programs and word processors.

Importing a data set from a tab delimited text file requires that the data in the file meet certain requirements. The data must be in the same format as an OptiPath exported text file. Any data set you export you can import, so you can take a look at an exported file to see what it should look like. The following, a file exported by OptiPath, is a file in proper format, except tabs have been replaced with commas so you can see where the tabs should be. All commas represent tabs.

Data Set, Description, Date Created, Author, Source, Version
Demo,Demonstration data - Release,2004-05-30 05:43:57,Fred Shepardson,My imagination,14.1.20.1

Seriation, Description, Author, Date Created, Unimodal Index, Path Length, Average Rate, Maximum Rate, Average Squared Rate, Objective, Date Error, Order Error, Normalized Date Error, Normalized Order Error, Break Ties, Use Frequencies, Use Dates, Earliest, Latest, Technique, Assemblages, Weights, Randomize, Seed
Test,New test,Fred,2005-09-03 08:21:11,1.0000,3.60555127546399,0.00311898899261591,0.00311898899261591,0.00374855824682814,Min Path Length,0,0.00,0,0.00 %,False,True,True,720,1876,Custom,True,False,False,1

Exclude, Index, Feature, Description, Data, Ranks, Metric, Normalize, Weight, Transition, Earlier, Later, Blanks, Zeroes
False,1,Density,Ratio of weight to volume,Measured,0,Euclidean,True,1,5,Unknown,Unknown,Unknown,Present & Zero
False,2,Rectangularity,Ratio of length to width,Measured,0,Euclidean,False,1,5,Unknown,Unknown,Unknown,Present & Zero
False,3,Fatness,Ratio of length to thickness,Measured,0,Euclidean,True,1,5,Unknown,Unknown,Unknown,Present & Zero

Exclude, Index, Type, Name, Description, Assemblage, Earliest, Latest, Use Dates, Order, Date, Distance, Rate, Density, Rectangularity, Fatness
False,0,Virtual,EARLIER,Artifacts earlier than the data,,720,720,0,720,0,1, Unknown, Unknown, Unknown
False,1,Artifact,Axes,Chert,Layer 1,,,False,1,720,0,1, 6, 6, 8
False,1,Assemblage,Layer 1,Upper (more recent) layer,,,,False,1,720,0,1, 7, 11, 12
False,2,Assemblage,Layer 2,Lower (older) layer,,748,891,True,2,876,1,1, 19, 8, 13
False,2,Artifact,Scraper,Flint,Layer 2,1321,1365,True,2,1346,1,1, 10, 2, 8
False,3,Artifact,Arrowhead,Quartzite,Layer 2,,,False,2,1876,1,1, 9, 6, 5
True,3,Assemblage,Layer 3,Lowest (oldest) layer,,,,False,7,0,0,, 0, 0, 0
False,4,Artifact,Spear point,Obsidian,Layer 1,721,721,True,1,720,0,1, 1, 5, 4
False,10000,Virtual,LATER,Artifacts later than the data,,1876,1876,3,1876,False,1,1, Unknown, Unknown, Unknown
False,10001,Statistic,Unimodal Index,Measure of unimodality of seriation,,,,False,10001,2147483646,1.2,1.0000, 1.0000, 1.0000, 1.0000

Blank rows are important! Don't leave them out.

Note that in the above example of a tab delimited textfile, the last two rows are for "Virtual,LATER" and "Statistic,Unimodal Index". Similarly, the second row in the last batch of rows is for "Virtual,EARLIER". This example file is a tab delimited textfile exported by OptiPath and includes information for reporting purposes not needed for input. These three rows are not needed on input and will be ignored (OptiPath will build the rows from information in the seriations and features data.

Warning!   The only tabs in your input text file should be those that are separating fields. If you have a tab within a field it will be interpreted as marking the end of a field and your date will likely not be imported correctly. Text files exported by OptiPath will not have any tabs embedded in a field.

Rows

Row 1 - the first row is a header row giving the names of the fields in the second row. This row has to be there but what you put in it is irrelevant. OptiPath ignores the first row. The row is there to define data in exported tab delimited text files.

Row 2 - the second row contains the data set parameters. The fields, in order, are:

Data Set - the name of the data set. This field is ignored but must be included, even if blank, with a following tab. OptiPath uses the name you entered in the dialog when opening the tab delimited text file.

Description - a description of the data set. This field can be blank (with a following tab).

Date Created - the date and time the data set is created. This field is ignored but must be included, even if blank, with a following tab. OptiPath will enter the current date and time.

Author - the name of the author of the data set. This field can be blank (with a following tab).

Source - the provenance of the data set. This field is ignored but must be included, even if blank, with a following tab. OptiPath will enter the path and file name of the tab delimited text file.

Version - the database version of the data set. This field is ignored but must be included, even if blank, with a following tab. OptiPath will enter the current database version.

Row 3 - the third row has to be there but what you put in it is irrelevant. OptiPath ignores the third row.

Row 4 - the fourth row is a header row giving the names of the fields in the fifth row. This row has to be there but what you put in it is irrelevant. OptiPath ignores the first row. The row is there to define data in exported tab delimited text files.

Row 5 - the fifth row contains the seriation parameters. The fields, in order, are:

Seriation - the name of the seriation. This field can be blank (with a following tab). If it is blank, OptiPath will put the current date and time as the seriation name.

Description - a description of the seriation. This field can be blank (with a following tab).

Author - the name of the author of the seriation. This field can be blank (with a following tab).

Date Created - the date and time the seriation is created. This field is ignored but must be included, even if blank, with a following tab. OptiPath will enter the current date and time.

Unimodal Index - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited text files.

Gradual Index - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited text files.

Path Length - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Average Rate - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Maximum Rate - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Average Squared Rate - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Objective - this field must be one of (without quotation marks) "Min Path Length", "Max Gradual Index", "Max Unimodal Index", "Max Gradual & Unimodal", "Min Avg Rate", "Min Max Rate" or "Min Avg Sqd Rate". See Objectives

Unimodal Weight - this field is a number between 0 and 1 indicating how much weight to put on the unimodal index when the objective is Max Gradual & Unimodal. See Objectives.

Date Error - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Order Error - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Normalized Date Error - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Normalized Order Error - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Break Ties - a logical value (TRUE or FALSE) indicating whether ties should be broken in the seriation.

Use Frequencies - a logical value (TRUE or FALSE) indicating whether input data shoud be converted to Frequencies for the seriation.

Use Dates - a logical value (TRUE or FALSE) indicating whether weights should be used in the seriation.

Earliest - the earliest date an item can be assigned.

Latest - the latest date an item can be assigned.

Technique - the technique to be used in the seriation. The technique must be one of the allowed values: Shortest Path, Occurrence, Frequency, Nominal, Discrete and Custom.

Assemblages - a logical value (TRUE or FALSE) indicating whether assemblages should be seriated, rather than artifacts.

Weights - a logical value (TRUE or FALSE) indicating whether weights should be used in the seriation.

Randomize - a logical value (TRUE or FALSE) indicating whether the seriation should be randomized.

Seed - the value of the seed for randomized seriations.

Row 6 - the sixth row has to be there but what you put in it is irrelevant. OptiPath ignores the sixth row.

Row 7 - the seventh row is a header row giving the names of the fields in the fifth row. This row has to be there but what you put in it is irrelevant. OptiPath ignores the first row. The row is there to define data in exported tab delimited textfiles.

Feature Rows - there should be one row for each feature. The feature rows each contain the parameters for that feature. The fields, in order, are:

Exclude - a logical value (TRUE or FALSE) indicating whether the feature should be included in the seriation.

Index - an integer value assigned to a feature that allows you to sort features in tables.

Feature - the name of the feature. Each feature must have a distinct non-blank name. Nameless features are ignored.

Description - a description of the feature. This field can be blank (with a following tab).

Data - the format of the data. The value must be one of the allowed values: Measured, Ranked and Classed.

Ranks - the limit on the number of ranks or classes allowed for ranked or classed data.

Metric - the metric (or distance function) to be used in computing distances (in "feature space") between items. The value must be one of the allowed values: Euclidean, Manhattan and Hamming.

Normalize - a logical value (TRUE or FALSE) indicating whether the feature should be normalized in the seriation.

Weight - a weight to be given to a feature.

Transition - a penalty to be applied to this feature each time the feature transitions from absent to present or vice versa in a seriation.

Earlier - how the feature should be considered for artifacts earlier than the earliest artifact in the data set.

Later - how the feature should be considered for artifacts later than the latest artifact in the data set.

Blanks - how blank values in the data should be considered for the feature.

Zeroes - how zero values in the data should be considered for the feature.

Row 8 - The eighth row has to be there but what you put in it is irrelevant. OptiPath ignores the eighth row.

Row 9 - The ninth row is a header row giving the names of the fields in the fifth row. This row has to be there but what you put in it is irrelevant. OptiPath ignores the first row. The row is there to define data in exported tab delimited text files.

Artifact and Assemblage Rows - there should be one row for each item (artifact or assemblage). The item rows each contain the parameters for that item. The fields, in order, are:

Exclude - a logical value (TRUE or FALSE) indicating whether the item should be included in the seriation.

Index - an integer value assigned to a item that allows you to sort items in tables.

Type - the type of the item. The value must be one of the allowed values: Artifact or Assemblage. You do not need to include rows for the Virtual type (EARLIER, LATER) or the Statistic type (Unimodal Index) that are in the example tab delimited text file above.

Name - the name of the item. Each item must have a distinct non-blank name. Nameless items are ignored.

Description - a description of the item. This field can be blank (with a following tab).

Assemblage - the name of the assemblage this artifact is to be assigned to. This field is meaningless if the Type for the item is Assemblage.

Earliest - the earliest possible date that can be assigned to this item. This field is optional but must be followed by a tab. If Date is also entered, it will replace this field with the value in Date.

Latest - the latest possible date that can be assigned to this item. This field is optional but must be followed by a tab. If Date is also entered, it will replace this field with the value in Date.

Use Dates - the column is assumed to contain True/False indicating whether the the Earliest and Latest dates for this artifact/assemblage should be used in creating a seriation. This column is optional.

Order - the item's sequential (ordinal) position in a seriation.

Date - the date for this item. This field is optional but must be followed by a tab. If a date is entered it will replace any entry for Earliest and Latest.

Distance - this field is ignored but must be included, even if blank, with a following tab.The field is there for reporting results in exported tab delimited textfiles.

Rate - this field is ignored but must be included, even if blank, with a following tab. The field is there for reporting results in exported tab delimited textfiles.

Feature Data - the data value for the feature and item.