In
all text file formats, // starts a comment lasting up to the end of the
line. In addition, applications may treat unexpected text as starting a
comment lasting up to the end of the line. White space is not
significant in the file formats specified here.
Symbols:
In all text file formats, the following
escape sequences shall be recognized:
- \s for the space symbol
- \] for the right square bracket ']'
- \[ for the left square bracket '['
- \| for the pipe '|'
- \; for the semicolon ';'
- \: for the colon ':'
- \{ for the left curly brace '{'
- \} for the right curly brace '}'
- \\ for the back slash
- \" for double quotes
- \e for the empty string
Alphabets:
The symbols allowed in alphabets are
those on which isgraph() returns true, and the space. Alphabets are
case sensitive. Alphabets are specified by listing the symbols involved
within square
brackets:
e.g.
- [01] for the binary alphabet
- [0 1 2 3 4 a b c] for the set {0,1,2,3,4,a,b,c}
- [( ) ] for the set consisting of parentheses
- [()\s ] is a set with three symbols (a space and left and right
parentheses)
A dash may be used to indicate a range:
e.g.
- [0-9] for the set of 10 decimal digits
- [A - Wa] for the set of uppercase letters from A to W plus
lowercase a
- [0-9a-f] for the set of hexadecimal digits, but not including
uppercase A to F
If a dash is part of the alphabet, it must be specified either first or
last, e.g.
- [-+*/] for the set {-,+,*,/}
A backslash followed by a right square bracket indicates that the right
square bracket is part of the alphabet. A backslash followed by
an 's' is the space symbol.
e.g.
- [\][] for the alphabet consisting of left and right square
brackets
- [\\][] for the alphabet consisting of the backslash,
left square bracket, and right square bracket
Strings
\e is the empty string. Strings that
include spaces should be enclosed in double quotes:
"a string with spaces enclosed by
double quotes".
The same string is represented by
a\sstring\swith\sspaces\senclosed\sby\sdouble\squotes
.
\" is used for double quotes when they
are part of the string:
"\"a string starting and ending with
double quotes\"".
Identifiers
Identifiers are case sensitive.
Identifiers shall consist of sequences of one or more symbols from the
identifier alphabet [a-zA-z0-9_]. Identifiers may start with any
symbol from the identifier alphabet, including the digits.
Finite Automata
Finite automata are defined as follows:
FASPEC
fa NAME;
type FATYPE;
input_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES;
}
fa_end;
FASPEC is a keyword.
fa is a keyword.
NAME is an identifier serving as the name of the finite automaton.
The type statement is optional, but may be required in some
applications.
type is a keyword.
FATYPE is one of: dfa, nfa, enfa;
The actual type of the finite automaton may be deduced from the
completeness and uniqueness of the transition rules, and the presence
or absence of transitions on an empty string.
input_alphabet is a keyword. ALPHABET is an alphabet.
states is a keyword. STATELIST is a list of identifiers separated
by commas. The first identifier in the list is the start
state. Identifiers with asterisks preceding them are accepting
states.
Each transition rule is of the form:
state, symbol : statelist ;
In transition rules, "\e" is used for epsilon, the empty string; \: is
used for colon whenever the colon is part of the input alphabet; \\ is
used for the backslash; \s is used for a space;
Pushdown Automata
Pushdown automata are defined as
follows:
PDASPEC
pda NAME;
type PDATYPE;
input_alphabet ALPHABET;
stack_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES;
}
pda_end;
PDASPEC is a keyword.
pda is a keyword.
NAME is an identifier serving as the name of the pushdown automaton.
The type statement is optional, but may be required in some
applications.
type is a keyword.
PDATYPE is one of: nullstack, finalstate, deterministic nullstack,
deterministic finalstate.
The actual type of the pushdown automaton may be deduced from the
completeness and uniqueness of the transition rules, and the presence
or absence of final or accepting states.
input_alphabet is a keyword. stack_alphabet is a keyword.
ALPHABET is an alphabet. The first symbol in the stack alphabet
list is the initial stack symbol.
states is a keyword. STATELIST is a list of identifiers separated
by
commas. The first identifier in the list is the start
state.
Identifiers with asterisks preceding them are accepting states.
Each transition rule is of the form:
state, symbol, stack_symbol :
STATE_STACK_REPLACEMENT_LIST ;
In transition rules, "\e" is used for epsilon, the empty string; \: is
used for colon whenever the colon is part of the input alphabet; \\ is
used for the backslash; \s is used for a space;
STATE_STACK_REPLACEMENT_LIST is a list of state and stack
replacement pairs, each one in parentheses:
(state1, stackreplacement1), (state2,
stackreplacement2), ... , (stateN, stackreplacementN)
Turing Machines
Turing Machines are defined as follows:
TMSPEC
tm NAME;
input_alphabet ALPHABET;
tape_alphabet ALPHABET;
states STATELIST;
transitions
{
LIST OF TRANSITION RULES
}
tm_end;
TMSPEC is a keyword.
tm is a keyword.
NAME is an identifier serving as the name of the Turing Machine.
input_alphabet is a keyword. tape_alphabet is a keyword.
ALPHABET is
an alphabet. The first symbol in the tape alphabet list is the
blank symbol.
states is a keyword. STATELIST is a list of identifiers separated
by
commas. The first identifier in the list is the start
state.
Identifiers with asterisks preceding them are accepting states.
Each transition rule is of the form:
state, tape_symbol : state,
tape_symbol, direction ;
In transition rules, "\e" is used for epsilon, the empty string; \: is
used for colon whenever the colon is part of the input alphabet; \\ is
used for the backslash; \s is used for a space;
direction is either L or R.
Context-Free Grammars
(file format definition initiated by Mitzi Ang)
Context-free grammars are defined as
follows:
CFGSPEC
cfg NAME;
terminals ALPHABET;
variables VARIABLELIST;
productions
{
LIST OF PRODUCTIONS
}
cfg_end;
CFGSPEC is a keyword.
cfg is a keyword.
NAME is an identifier serving as the name of the context-free grammar.
terminals is a keyword. ALPHABET is an alphabet.
variables is a keyword. VARIABLELIST is a list of identifiers separated
by commas. The first identifier in the list is the start symbol.
Each production is of the form:
variable: POSSIBLE_DERIVATIONS ;
Possible derivations are separated by |. Variables appearing on the
right-hand side of derivations shall be enclosed in square brackets.
For example,
S: a [S] b [S];
is a production where the variable S
derives aSbS.
In the right-hand side of productions, "\e" is used for epsilon, the
empty string.