[Lui's homepage] [CE 160 homepage]

Text File Formats for Representing Automata and Grammars
v. 0.91

In all text file formats, // starts a comment lasting up to the end of the line. In addition, applications may treat unexpected text as starting a comment lasting up to the end of the line. White space is not significant in the file formats specified here.

Symbols:
In all text file formats, the following escape sequences shall be recognized:
Alphabets:

The symbols allowed in alphabets are those on which isgraph() returns true, and the space. Alphabets are case sensitive. Alphabets are specified by listing the symbols involved within square brackets:
e.g. 
A dash may be used to indicate a range:
e.g.
If a dash is part of the alphabet, it must be specified either first or last, e.g.
A backslash followed by a right square bracket indicates that the right square bracket is part of the alphabet.  A backslash followed by an 's' is the space symbol.
e.g.
Strings
\e is the empty string. Strings that include spaces should be enclosed in double quotes:
"a string with spaces enclosed by double quotes".
The same string is represented by
a\sstring\swith\sspaces\senclosed\sby\sdouble\squotes .
\" is used for double quotes when they are part of the string:
"\"a string starting and ending with double quotes\"".

Identifiers
Identifiers are case sensitive. Identifiers shall consist of sequences of one or more symbols from the identifier alphabet [a-zA-z0-9_].  Identifiers may start with any symbol from the identifier alphabet, including the digits.
 
Finite Automata

Finite automata are defined as follows:

FASPEC
fa NAME;
type FATYPE;
input_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES;
}
fa_end;

FASPEC is a keyword.
fa is a keyword.
NAME is an identifier serving as the name of the finite automaton.

The type statement is optional, but may be required in some applications.
type is a keyword.
FATYPE is one of: dfa, nfa, enfa;
The actual type of the finite automaton may be deduced from the completeness and uniqueness of the transition rules, and the presence or absence of transitions on an empty string.

input_alphabet is a keyword.  ALPHABET is an alphabet.

states is a keyword.  STATELIST is a list of identifiers separated by commas.   The first identifier in the list is the start state.  Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, symbol : statelist ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

Pushdown Automata

Pushdown automata are defined as follows:

PDASPEC
pda NAME;
type PDATYPE;
input_alphabet ALPHABET;
stack_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES;
}
pda_end;

PDASPEC is a keyword.
pda is a keyword.
NAME is an identifier serving as the name of the pushdown automaton.

The type statement is optional, but may be required in some applications.
type is a keyword.
PDATYPE is one of: nullstack, finalstate, deterministic nullstack, deterministic finalstate.
The actual type of the pushdown automaton may be deduced from the completeness and uniqueness of the transition rules, and the presence or absence of final or accepting states.

input_alphabet is a keyword.  stack_alphabet is a keyword.  ALPHABET is an alphabet.  The first symbol in the stack alphabet list is the initial stack symbol.

states is a keyword.  STATELIST is a list of identifiers separated by commas.   The first identifier in the list is the start state.  Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, symbol, stack_symbol : STATE_STACK_REPLACEMENT_LIST ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

STATE_STACK_REPLACEMENT_LIST  is a list of state and stack replacement pairs, each one in parentheses:
(state1, stackreplacement1), (state2, stackreplacement2), ... , (stateN, stackreplacementN)

Turing Machines

Turing Machines are defined as follows:

TMSPEC
tm NAME;
input_alphabet ALPHABET;
tape_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES
}
tm_end;

TMSPEC is a keyword.
tm is a keyword.
NAME is an identifier serving as the name of the Turing Machine.

input_alphabet is a keyword.  tape_alphabet is a keyword.  ALPHABET is an alphabet.  The first symbol in the tape alphabet list is the blank symbol.

states is a keyword.  STATELIST is a list of identifiers separated by commas.   The first identifier in the list is the start state.  Identifiers with asterisks preceding them are accepting states.

Each transition rule is of the form:

state, tape_symbol : state, tape_symbol, direction ;

In transition rules, "\e" is used for epsilon, the empty string; \: is used for colon whenever the colon is part of the input alphabet; \\ is used for the backslash; \s is used for a space;

direction is either L or R.

Context-Free Grammars
(file format definition initiated by Mitzi Ang)

Context-free grammars are defined as follows:

CFGSPEC
cfg NAME;
terminals ALPHABET;        
variables VARIABLELIST;  
productions
{
 LIST OF PRODUCTIONS
}
cfg_end;

CFGSPEC is a keyword.
cfg is a keyword.
NAME is an identifier serving as the name of the context-free grammar.

terminals is a keyword. ALPHABET is an alphabet.
variables is a keyword. VARIABLELIST is a list of identifiers separated by commas. The first identifier in the list is the start symbol.

Each production is of the form:
variable: POSSIBLE_DERIVATIONS ;
Possible derivations are separated by |. Variables appearing on the right-hand side of derivations shall be enclosed in square brackets. For example,
S: a [S] b [S];
is a production where the variable S derives aSbS.

In the right-hand side of productions, "\e" is used for epsilon, the empty string.


 
[Lui's homepage] [CE 160 homepage]


This page has been accessed
Counter
times since July 19, 2004.

Created July 19, 2004
Last updated August 4, 2004
By Luisito L. Agustin
lui_agustin@yahoo.com

 

 

 

 

1