What is the Fielded Text standard all about?

The ultimate aim of the Fielded Text Standard is to make it a lot easier to use text files to store or distribute tables of values.

Currently files such as Comma Separated Value (CSV) text files are often used to transfer this type of data.  Their advantage is that they are easy to understand, can be viewed with a plain text editor, and programmers can access the data without any special run-times.  Their disadvantage is that everyone uses different structures and formats for storing their data in text files.  Hence programmers invariably have to write a new parser whenever they want to extract data from a text file from a new supplier.

Fielded Text takes a new approach when working with text files containing tables of values!

Fielded Text allows you to associate a Meta File with a text file.  This Meta file is a small XML file which describes the structure and format of data within the text file.  More importantly, it allows you to access the data in the file in the same way you access records in a database table.  It combines the simplicity of text files with the convenience of database access.

A full description of how the Fielded Text standard enables this can be found in the "Standard" page on this website.  However if you are looking for a concise description, read on.

Capabilities

The Fielded Text standard aims to be compatible with all text files today that contain tables of values.  In support of this aim, it has a set of capabilities which should cover nearly all structures and formats used in existing text files.  These capabilities are summarised here.

Creating Meta Files

There are several ways of creating a Meta file for a FieldedText file:

  1. Use a text editor.  Meta files are XML files so, provided you know the tags and schema, you can use any text editor (or XML file editor) to create it.

  2. Use a Fielded Text Editor.  This is the easy way.  A Fielded Text Editor is a specialised text editor for working with Fielded Text files.  With a Fielded Text Editor you can load a sample text file and then visually set the Meta properties for it.  The editor will interactively show you any parsing errors arising from incorrect properties.  Once the properties are correctly set, you can export them into a Meta File.

    A list of Fielded Text editors is here - including at least one free one.

  3. Programatically.  Fielded Text software components can also be used to generate Meta files programatically.  This method will be used in specialised Fielded Text applications (for example, Fielded Text Editors).

Meta files typically have a file extension of "ftm".

Declared and Undeclared Fielded Text Files

Fielded Text files can be either declared or not declared (undeclared).

A declared Fielded Text file has 2 special lines at the start of the file.  These 2 lines are called the file declaration.  The declaration contains a marker which identifies the file as a Fielded Text file and it specifies what version of the Fielded Text Standard the file conforms to.

In addition, the declaration can also specify the Meta file which this text file is associated with.  It can specify the Meta with:

Declared Fielded Text files remove the need for end users to match a text file with the Meta.  This makes them more reliable as users are less likely to parse or interpret the data incorrectly.  For example, organisations publishing data, can place a copy of the Meta on their website and have the downloadable data text files reference it.  End users can then simply download the text file and the parser will automatically know how to obtain the correct Meta.

Nearly all existing text files containing tables of values will be 'Undeclared' Fielded Text files.  They can be handled exactly the same way as Declared Fielded Text files however the text file will need to be explicitly associated with its Meta.

Basic Example

Below is a basic CSV file. It has 2 heading lines and 4 data lines. The lines contain 7 fields of various types.

"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
, (Years), , , (Dollars), ,
"Rover", 4.5, Brown, 12 Feb 2004, 80, True, "Dog"
"Charlie", , Gold, 5 Apr 2007, 12.3, False, "Fish"
"Molly", 2, Black, 12 Dec 2006, 25, False, "Cat"
"Gilly", , White, 10 Apr 2007, 10, False, "Guinea Pig"

The following Fielded Text Meta file specifies the structure and layout (schema) of the above text file.

<?xml version="1.0" encoding="utf-16"?>
<FieldedText HeadingLineCount="2">
<Field Name="PetName" />
<Field DataType="Float" Name="Age" />
<Field Name="Color" />
<Field DataType="DateTime" Name="DateReceived" Format="d MMM yyyy" />
<Field DataType="Decimal" Name="Price" />
<Field DataType="Boolean" Name="NeedsWalking" />
<Field Name="Type" />
</FieldedText>

Following is a Declared Fielded Text file which contains the above CSV text together with the its meta embedded as comments. The ~ character specifies a comment line.

~|!Fielded Text^| Version="1.0"
~ MetaEmbedded="True"
~ <?xml version="1.0" encoding="utf-16"?>
~ <FieldedText LineCommentChar="~" HeadingLineCount="2">
~ <Field Name="PetName" />
~ <Field DataType="Float" Name="Age" />
~ <Field Name="Color" />
~ <Field DataType="DateTime" Name="DateReceived" Format="d MMM yyyy" />
~ <Field DataType="Decimal" Name="Price" />
~ <Field DataType="Boolean" Name="NeedsWalking" />
~ <Field Name="Type" />
~ </FieldedText>
"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
, (Years), , , (Dollars), ,
"Rover", 4.5, Brown, 12 Feb 2004, 80, True, "Dog"
"Charlie", , Gold, 5 Apr 2007, 12.3, False, "Fish"
"Molly", 2, Black, 12 Dec 2006, 25, False, "Cat"
"Gilly", , White, 10 Apr 2007, 10, False, "Guinea Pig"

Fielded Text file Structure

A Fielded Text file consists of 2 main parts: Header and Body.  The Body contains the lines which hold the data (the records).  The Headers consist of all the lines prior to the Body (including heading lines).

At a more detail level, the header part of the file can be split into the following sections:

The body part of the text file begins either:

The record part can contain record lines and comment line (and possibly ignored blank lines).  A record line contains the actual data - ie. a row of values.  Each record line consists of a sequence of field values.  The format of these field values is specified by the Meta.  The Meta also specifies the structure of the record lines, including how field values are separated.

It is possible for a record to span multiple lines in the text file.  This will occur when a field value contains an "End of Line" character(s).  If a record does span multiple lines, then any line in that record will not be treated as comment line or an ignored blank line.  Accordingly, it is possible for lines in the body part to begin with a line comment character but not be treated as a comment line.

Fielded Text Meta file Structure

The Meta contains the following groups of information:

  1. Main Section which specifies properties applying to the whole text file.  In the above example Meta file, the attributes in the <FieldedText> element apply to the whole text file and make up the main section.  The Main Section can contain the following properties/attributes:

    Attribute Description Default
    Culture Specifies which regional conventions should be used. Invariant (Generally US based)
    EndOfLineType Method used to detect line ends in text file Auto
    EndOfLineChar Character which denotes line end when EndOfLineType is "Char" ;
    EndOfLineAutoWriteType Method used to write line ends when EndOfLineType = "Auto" Local
    QuoteChar Character used to quote a field value (ie enclose field value) "
    DelimiterChar Character which separates fields in a line ,
    LineCommentChar Character which, if it's first in line, denotes that line is a comment 0x04
    StuffedEmbeddedQuotes Quotes can be embedded in a quoted field by having 2 in a row True
    SubstitutionEnabled Enables substitutions in the text False
    SubstitutionChar Character which identifies a substitution \
    AllowEndOfLineCharInQuotes Allow End of Line character(s) within a quoted string True
    IgnoreBlankLines Ignore blank lines in the text True
    IgnoreExtraChars Ignore any characters in a line after all fields have been parsed True
    HeadingLineCount Number of heading lines 0
    MainHeadingLineIndex Index of Main Heading line 0
    HeadingConstraint Default Constraints applied to field headings None
    HeadingQuotedType Default specification for how field heading values are quoted Optional
    HeadingAlwaysWriteOptionalQuote Default specifier for whether field heading optional quotes should be written True
    HeadingWritePrefixSpace Default specifier for whether field headings should be prefixed with a space when written False
    HeadingPadAlignment Default alignment of padding for fixed width field headings Auto
    HeadingPadCharType Default method used to pad fixed width field headings EndOfValue
    HeadingPadChar Default character used to pad fixed width field headings <space>
    HeadingTruncateType Default method used to truncate fixed width field headings Right
    HeadingTruncateChar Default character used to fill truncated field headings if HeadingTruncateType = TruncateChar #
    HeadingEndOfValueChar Default character used to flag End of Field Heading when HeadingPadCharType = EndOfValue 0x03

    Additionally, it can also contain some Design-only properties.

  2. Field Sections which specify the properties of each field of data used within the text file.  In the above example Meta file, the attributes in a <Field> element apply to the respective field and make up a field section.  A Field Section can contain the following properties/attributes:

    Attribute Description Default
    DataType Field Data Type String
    Index Explicitly specifies position of field
    Id Tag available for User Definition 0
    Name Field Name <Blank>
    FixedWidth Specifies whether field has a fixed number of characters False
    Width Number of characters in field if FixedWidth = True 1
    HeadingConstraint Constraints applied to headings Main HeadingConstraint
    Constant Field is a constant False
    ValueQuotedType Specification for how field values are quoted Optional
    ValueAlwaysWriteOptionalQuote Specifier for whether a value's optional quotes should be written False
    ValueWritePrefixSpace Specifier for whether values should be prefixed with a space when written False
    ValuePadAlignment Alignment of padding for fixed width field values Auto
    ValuePadCharType Method used to pad fixed width field values EndOfValue
    ValuePadChar Character used to pad fixed width field values Depends on DataType
    ValueTruncateType Method used to truncate fixed width field values Exception
    ValueTruncateChar Character used to fill truncated field values if ValueTruncateType = TruncateChar #
    ValueEndOfValueChar Character used to flag End of Field Value when ValuePadCharType = EndOfValue 0x03
    ValueNullChar Character used to fill truncated field values if ValueTruncateType = NullChar *
    HeadingQuotedType Specification for how heading values are quoted Main HeadingQuotedType
    HeadingAlwaysWriteOptionalQuote Specifier for whether heading optional quotes should be written Main HeadingAlwaysWriteOptionalQuote
    HeadingWritePrefixSpace Specifier for whether headings should be prefixed with a space when written Main HeadingWritePrefixSpace
    HeadingPadAlignment Alignment of padding for fixed width field headings Main HeadingPadAlignment
    HeadingPadCharType Method used to pad fixed width field headings Main HeadingPadCharType
    HeadingPadChar Character used to pad fixed width field headings Main HeadingPadChar
    HeadingTruncateType Method used to truncate fixed width field headings Main HeadingTruncateType
    HeadingTruncateChar Character used to fill truncated field headings if HeadingTruncateType = TruncateChar Main HeadingTruncateChar
    HeadingEndOfValueChar Character used to flag End of Field Heading when HeadingPadCharType = EndOfValue Main HeadingEndOfValueChar
    Headings Field Headings as comma text <Blank>
    Null Specifies whether field value is Null if Constant = True False
    Value Specifies field value if Constant = True Depends on DataType
    Format Text format of field value Depends on DataType
    Styles Either restrict or allow additional formatting when parsing text field values Depends on DataType
    FalseText Text presentation of Boolean field False value False
    TrueText Text presentation of Boolean field True value True
  3. Fields can have a DataType of: String, Boolean, Integer, Float, Decimal (similar to Float but better suited for financial calculations) or DateTime.  Some of the attributes listed above are not applicable to all field DataTypes and some use different values in different DataTypes.

  4. Substitution Sections specify which substitutions are used within the text file.  Substitutions are similar to Escape Sequences used in some CSV files (eg \n).  A Substitution Section can contain the following properties/attributes:

    Attribute Description Default
    Type The type of substitution String
    Token A character which determines the substitution to be invoked
    Value The string value to replace the substitution character and token (if Type = String)
  5. Sequence Sections.  A Fielded Text file can have lines with different sets of fields depending on the value of a key field(s).  The Sequence Sections in the Meta File specify the sequence of fields which can follow a key field.  A Sequence Section can contain the following properties/attributes:

    Attribute Description Default
    Name Name of Sequence
    Root Specifies whether this is the first sequence invoked for each record (line) False
    FieldIndices Shorthand list of fields in this sequence (Field indices array in commatext string) <Blank>

    Each Sequence has a series of <Item> elements which specify the fields included in the sequence.  An <Item> element can contain the following properties/attributes:

    Attribute Description Default
    Index Explicitly specifies position of Sequence Item in Sequence
    FieldIndex Index of field (in Field List) used by this Sequence Item

    An <Item> element can also contain a series of <Redirect> elements.  The <Redirect> elements determine which sequence should be invoked if a field contains specified values.  A <Redirect> element can contain the following properties/attributes:

    Attribute Description Default
    Index Explicitly specifies position of Redirect
    Type Specifies type of comparison Redirect makes with Field Value Depends on DataType of Sequence Item's Field
    SequenceName Name of Sequence to be invoked if the Field's Value matches the Redirect Value
    InvokationDelay Specifies whether specified Sequence should be invoked after current field or after current sequence AfterField
    Value Value against which Field Value is compared Depends on Redirect Type