[antlr-interest] Grammar help

Brian Catlin BrianC at sannas.org
Mon Mar 15 20:54:31 PDT 2010


I am trying to create a grammar for a command language, and I'm stuck.  I'm
using ANTLR-3.1-2009-06-28 and libantlr3c-3.2.  The language is fairly
simplistic, where commands are of the form Verb Noun; however, some commands
can have a file name as part of the command (always the last item of the
command), and due to the wide range of possible characters in a file name,
ANTLR gets confused.  So, the question is, "How would I write a grammar that
will work?"

 

On Windows, a file name may contain any character except <,>,|,?,*,".  In
the grammar, if a file name has any spaces in it, then the entire name must
be enclosed within double-quotes (" "), and I don't want the WS (white space
token) to eat the white space within the quotes.  So, a file name may be a
quoted string (I'll strip off the quotes once I have the string) or an
unquoted string.  It would also be nice to be able to have LINE_COMMENTs on
the same line as a command with a file name, but that is not a requirement.

 

It occurred to me that instead of trying to build a token that overlaps with
pretty much every other token, that I could just grab everything from where
the file name starts on the line, to the end of the line, but I don't know
how to do that.

 

When I compile the grammar with ANTLR, I get the following:

 

warning(149): Commands.g:0:0: rewrite syntax or operator with no output
option; setting output=AST

warning(200): Commands.g:146:14: Decision can match input such as
"{'\u0000'..'\ f', '\u000E'..')', '+'..';', '=', '@'..'{', '}'..'\uFFFF'}"
using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input

warning(200): Commands.g:146:14: Decision can match input such as "'\r'"
using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input

error(208): Commands.g:151:1: The following token definitions can never be
matched because prior tokens match the same input: WS

 

ANTLR generates a lexer and a parser, but they don't do anything (any text
will be a match, even if it isn't in the defined token list).

 

Following is an abbreviated version of the grammar - the real grammar has a
lot more verbs and nouns - but this should give you the flavor of what I'm
trying to do.

 

//

// This grammar defines the commands available to the DiskTool (DT) program

//

 

grammar Commands;

 

options 

     {

     language = C;

     backtrack = true;

     memoize = true;

     }

 

@lexer::header

{

#define    ANTLR3_INLINE_INPUT_ASCII

}

 

//+

// Productions

//-

 

commands

     :

     (script_command

     | dump_command

     ! show_command

     )*;

 

script_command

     :  '@' 

     FILE_NAME       {printf ("File name [\%s]\n", $FILE_NAME);}

     ;

 

dump_command

     : DUMP

     (dump_struct

     | dump_block

     | a_file

     );

 

show_command

     : SHOW

     (structure_nouns

     | storage_nouns

     | a_file

     );

     

mbr_vbr

     : MBR 

     | VBR

     ;

 

block_nouns

     : LBN 

     | LCN 

     | VBN 

     | VCN

     ;

 

structure_nouns

     : MBR

     | VBR

     ;

 

dump_block

     : block_nouns

     number

     ((',' number)

     | (':' number))?

     DRIVE_NAME?

     ;

 

dump_struct

     : mbr_vbr

     ('/' qualifier)?

     DRIVE_NAME?

     ;

 

storage_nouns

     : DISK

     | VOLUME

     ;

     

a_file

     : FILE

     FILE_NAME       {printf ("File name [\%s]\n", $FILE_NAME);}

     ;

 

number

     : DEC_NUMBER 

     | HEX_NUMBER

     ;

 

qualifier

     : ALL

     ! CODE

     | TABLE

     ;

 

//+

// Tokens

//-

 

// Verbs

 

DUMP : 'DUMP';

SHOW : 'SHOW';

 

// Nouns

 

DISK : 'DISK';

FILE : 'FILE';

LBN  : 'LBN';

LCN  : 'LCN';

MBR  : 'MBR';

PBN  : 'PBN';

VBN  : 'VBN';

VBR  : 'VBR';

VCN  : 'VCN';

VOLUME     : 'VOLUME';

 

// Qualifiers

 

ALL  : 'ALL';

CODE : 'CODE';

TABLE : 'TABLE';

 

// Miscellaneous tokens

 

DRIVE_NAME : LETTER ':';

 

fragment

LETTER     : 'A'..'Z';

 

fragment

DIGIT : '0'..'9';

 

fragment

HEX_DIGIT  : (DIGIT | 'A'..'F');

 

HEX_NUMBER : '0X' HEX_DIGIT+;

 

DEC_NUMBER : DIGIT+;

 

FILE_NAME  :  ~('|' | '<' | '>' | '*' | '?')+ (('\r'? '\n') | EOF);

 

LINE_COMMENT

     : '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;};

 

WS   : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};

 



More information about the antlr-interest mailing list