Text File Formats for Representing Automata

In all text file formats, // starts a comment lasting up to the end of the line. In addition, applications may treat unexpected text as starting a comment lasting up to the end of the line. White space is not significant in the file formats specified here.

Symbols:

In all text file formats, the following escape sequences shall be recognized:

\s for the space symbol
\] for the right square bracket ']'
\[ for the left square bracket '['
\| for the pipe '|'
\; for the semicolon ';'
\: for the colon ':'
\{ for the left curly brace '{'
\} for the right curly brace '}'
\\ for the back slash
\" for double quotes
\e for the empty string

Alphabets:

The symbols allowed in alphabets are those on which isgraph() returns true, and the space. Alphabets are case sensitive. Alphabets are specified by listing the symbols involved within square brackets:
e.g.

[01] for the binary alphabet
[0 1 2 3 4 a b c] for the set {0,1,2,3,4,a,b,c}
[( ) ] for the set consisting of parentheses
[()\s ] is a set with three symbols (a space and left and right parentheses)

A dash may be used to indicate a range:
e.g.

[0-9] for the set of 10 decimal digits
[A - Wa] for the set of uppercase letters from A to W plus lowercase a
[0-9a-f] for the set of hexadecimal digits, but not including uppercase A to F

If a dash is part of the alphabet, it must be specified either first or last, e.g.

[-+*/] for the set {-,+,*,/}

A backslash followed by a right square bracket indicates that the right square bracket is part of the alphabet. A backslash followed by an 's' is the space symbol.
e.g.

[\][] for the alphabet consisting of left and right square brackets
[\\][] for the alphabet consisting of the backslash, left square bracket, and right square bracket

Strings

\e is the empty string. Strings that include spaces should be enclosed in double quotes:

"a string with spaces enclosed by double quotes".

The same string is represented by

a\sstring\swith\sspaces\senclosed\sby\sdouble\squotes .

\" is used for double quotes when they are part of the string:

"\"a string starting and ending with double quotes\"".

Identifiers

Identifiers are case sensitive. Identifiers shall consist of sequences of one or more symbols from the identifier alphabet [a-zA-z0-9_]. Identifiers may start with any symbol from the identifier alphabet, including the digits.

Finite Automata

Finite automata are defined as follows:

FASPEC
fa NAME;
type FATYPE;
input_alphabet ALPHABET;
states STATELIST;
transitions
{

LIST OF TRANSITION RULES;

}
fa_end;

FASPEC is a keyword.
fa is a keyword.
NAME is an identifier serving as the name of the finite automaton.

The type statement is optional, but may be required in some applications.
type is a keyword.
FATYPE is one of: dfa, nfa, enfa;
The actual type of the finite automaton may be deduced from the completeness and uniqueness of the transition rules, and the presence or absence of transitions on an empty string.

input_alphabet is a keyword. ALPHABET is an alphabet.

states is a keyword. STATELIST is a list of identifiers separated by commas. The first identifier in the list is the start state. Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, symbol : statelist ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

Pushdown Automata

Pushdown automata are defined as follows:

PDASPEC
pda NAME;
type PDATYPE;
input_alphabet ALPHABET;
stack_alphabet ALPHABET;
states STATELIST;
transitions
{

LIST OF TRANSITION RULES;

}

pda_end;

PDASPEC is a keyword.
pda is a keyword.
NAME is an identifier serving as the name of the pushdown automaton.

The type statement is optional, but may be required in some applications.
type is a keyword.
PDATYPE is one of: nullstack, finalstate, deterministic nullstack, deterministic finalstate.
The actual type of the pushdown automaton may be deduced from the completeness and uniqueness of the transition rules, and the presence or absence of final or accepting states.

input_alphabet is a keyword. stack_alphabet is a keyword. ALPHABET is an alphabet. The first symbol in the stack alphabet list is the initial stack symbol.

states is a keyword. STATELIST is a list of identifiers separated by commas. The first identifier in the list is the start state. Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, symbol, stack_symbol : STATE_STACK_REPLACEMENT_LIST ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

STATE_STACK_REPLACEMENT_LIST is a list of state and stack replacement pairs, each one in parentheses:

(state1, stackreplacement1), (state2, stackreplacement2), ... , (stateN, stackreplacementN)

Turing Machines

Turing Machines are defined as follows:

TMSPEC
tm NAME;
input_alphabet ALPHABET;
tape_alphabet ALPHABET;
states STATELIST;
transitions
{

LIST OF TRANSITION RULES

}

tm_end;

TMSPEC is a keyword.
tm is a keyword.
NAME is an identifier serving as the name of the Turing Machine.

input_alphabet is a keyword. tape_alphabet is a keyword. ALPHABET is an alphabet. The first symbol in the tape alphabet list is the blank symbol.

states is a keyword. STATELIST is a list of identifiers separated by commas. The first identifier in the list is the start state. Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, tape_symbol : state, tape_symbol, direction ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

direction is either L or R.

Context-Free Grammars
(file format definition initiated by Mitzi Ang)

Context-free grammars are defined as follows:

CFGSPEC
cfg NAME;
terminals ALPHABET;
variables VARIABLELIST;
productions
{

LIST OF PRODUCTIONS

}
cfg_end;

CFGSPEC is a keyword.
cfg is a keyword.
NAME is an identifier serving as the name of the context-free grammar.

terminals is a keyword. ALPHABET is an alphabet.
variables is a keyword. VARIABLELIST is a list of identifiers separated by commas. The first identifier in the list is the start symbol.

Each production is of the form:

variable: POSSIBLE_DERIVATIONS ;

Possible derivations are separated by |. Variables appearing on the right-hand side of derivations shall be enclosed in square brackets. For example,

S: a [S] b [S];

is a production where the variable S derives aSbS.

In the right-hand side of productions, "\e" is used for epsilon, the empty string.

Text File Formats for Representing Automata and Grammars v. 0.91

Text File Formats for Representing Automata and Grammars
v. 0.91