[antlr-interest] Grammar help
Brian Catlin
BrianC at sannas.org
Mon Mar 15 20:54:31 PDT 2010
I am trying to create a grammar for a command language, and I'm stuck. I'm
using ANTLR-3.1-2009-06-28 and libantlr3c-3.2. The language is fairly
simplistic, where commands are of the form Verb Noun; however, some commands
can have a file name as part of the command (always the last item of the
command), and due to the wide range of possible characters in a file name,
ANTLR gets confused. So, the question is, "How would I write a grammar that
will work?"
On Windows, a file name may contain any character except <,>,|,?,*,". In
the grammar, if a file name has any spaces in it, then the entire name must
be enclosed within double-quotes (" "), and I don't want the WS (white space
token) to eat the white space within the quotes. So, a file name may be a
quoted string (I'll strip off the quotes once I have the string) or an
unquoted string. It would also be nice to be able to have LINE_COMMENTs on
the same line as a command with a file name, but that is not a requirement.
It occurred to me that instead of trying to build a token that overlaps with
pretty much every other token, that I could just grab everything from where
the file name starts on the line, to the end of the line, but I don't know
how to do that.
When I compile the grammar with ANTLR, I get the following:
warning(149): Commands.g:0:0: rewrite syntax or operator with no output
option; setting output=AST
warning(200): Commands.g:146:14: Decision can match input such as
"{'\u0000'..'\ f', '\u000E'..')', '+'..';', '=', '@'..'{', '}'..'\uFFFF'}"
using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Commands.g:146:14: Decision can match input such as "'\r'"
using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(208): Commands.g:151:1: The following token definitions can never be
matched because prior tokens match the same input: WS
ANTLR generates a lexer and a parser, but they don't do anything (any text
will be a match, even if it isn't in the defined token list).
Following is an abbreviated version of the grammar - the real grammar has a
lot more verbs and nouns - but this should give you the flavor of what I'm
trying to do.
//
// This grammar defines the commands available to the DiskTool (DT) program
//
grammar Commands;
options
{
language = C;
backtrack = true;
memoize = true;
}
@lexer::header
{
#define ANTLR3_INLINE_INPUT_ASCII
}
//+
// Productions
//-
commands
:
(script_command
| dump_command
! show_command
)*;
script_command
: '@'
FILE_NAME {printf ("File name [\%s]\n", $FILE_NAME);}
;
dump_command
: DUMP
(dump_struct
| dump_block
| a_file
);
show_command
: SHOW
(structure_nouns
| storage_nouns
| a_file
);
mbr_vbr
: MBR
| VBR
;
block_nouns
: LBN
| LCN
| VBN
| VCN
;
structure_nouns
: MBR
| VBR
;
dump_block
: block_nouns
number
((',' number)
| (':' number))?
DRIVE_NAME?
;
dump_struct
: mbr_vbr
('/' qualifier)?
DRIVE_NAME?
;
storage_nouns
: DISK
| VOLUME
;
a_file
: FILE
FILE_NAME {printf ("File name [\%s]\n", $FILE_NAME);}
;
number
: DEC_NUMBER
| HEX_NUMBER
;
qualifier
: ALL
! CODE
| TABLE
;
//+
// Tokens
//-
// Verbs
DUMP : 'DUMP';
SHOW : 'SHOW';
// Nouns
DISK : 'DISK';
FILE : 'FILE';
LBN : 'LBN';
LCN : 'LCN';
MBR : 'MBR';
PBN : 'PBN';
VBN : 'VBN';
VBR : 'VBR';
VCN : 'VCN';
VOLUME : 'VOLUME';
// Qualifiers
ALL : 'ALL';
CODE : 'CODE';
TABLE : 'TABLE';
// Miscellaneous tokens
DRIVE_NAME : LETTER ':';
fragment
LETTER : 'A'..'Z';
fragment
DIGIT : '0'..'9';
fragment
HEX_DIGIT : (DIGIT | 'A'..'F');
HEX_NUMBER : '0X' HEX_DIGIT+;
DEC_NUMBER : DIGIT+;
FILE_NAME : ~('|' | '<' | '>' | '*' | '?')+ (('\r'? '\n') | EOF);
LINE_COMMENT
: '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;};
WS : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
More information about the antlr-interest
mailing list