[antlr-interest] Grammar help

Tue Mar 16 02:12:10 PDT 2010

In my excitement of not seeing any error messages, I neglected to really
test the parser :-(

I don't get the errors I was getting before, but that is because the
FILE_NAME token is matching everything,  I put a simple printf action on the
FILE_NAME token, and it gets called for all input:

DT> @abc.def
Found file name: @abc.def
DT> illegal command
Found file name: illegal command
DT> 'alj;klajjf
Found file name: 'alj;klajjf

Is there a way to make the FILE_NAME token context sensitive so that the
lexer doesn't try to match it unless we're in a rule that wants to find a
file name?  I tried making the FILE_NAME token a fragment, but then the
parser failed to recognize anything as valid.

Here's the grammar:

//
// This grammar defines the commands available to the DiskTool (DT) program
//

grammar Commands;

options 
	{
	language = C;
	backtrack = true;
	memoize = true;
	}

@lexer::header
{
#define	ANTLR3_INLINE_INPUT_ASCII
}

//+
// Productions
//-

commands
	:
	(script_command
	| dump_command
	| show_command
	)*;

script_command
	:  '@' 
	FILE_NAME
	;

dump_command
	: DUMP
	(dump_struct
	| dump_block
	| a_file
	);

show_command
	: SHOW
	(structure_nouns
	| storage_nouns
	| a_file
	);

mbr_vbr
	: MBR 
	| VBR
	;

block_nouns
	: LBN 
	| LCN 
	| VBN 
	| VCN
	;

structure_nouns
	: MBR
	| VBR
	;

dump_block

	: block_nouns
	number
	(
	(',' number
	)
	| 
	(':' number
	))?
	DRIVE_NAME?
	;

dump_struct
	: mbr_vbr
	('/' qualifier)?
	DRIVE_NAME?
	;

storage_nouns
	: DISK
	| VOLUME
	;

a_file
	: FILE
	FILE_NAME
	;

number
	: DEC_NUMBER 
	| HEX_NUMBER
	;

qualifier
	: ALL
	| CODE
	| TABLE
	;

//+
// Tokens
//-

// Verbs

DUMP	: 'DUMP';
SHOW	: 'SHOW';

// Nouns

DISK	: 'DISK';
FILE	: 'FILE';
LBN	: 'LBN';
LCN	: 'LCN';
MBR	: 'MBR';
PBN	: 'PBN';
VBN	: 'VBN';
VBR	: 'VBR';
VCN	: 'VCN';
VOLUME	: 'VOLUME';

// Qualifiers

ALL	: 'ALL';
CODE	: 'CODE';
TABLE	: 'TABLE';

// Miscellaneous tokens

DRIVE_NAME
	: LETTER ':'
	;

fragment
LETTER	: 'A'..'Z';

fragment
DIGIT	: '0'..'9';

fragment
HEX_DIGIT	: (DIGIT | 'A'..'F');

HEX_NUMBER	: '0X' HEX_DIGIT+;

DEC_NUMBER	: DIGIT+;

FILE_NAME
	:  ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n')+ (('\r'? '\n') |
EOF)
	{printf("Found file name: \%s\n", GETTEXT()->chars);};

LINE_COMMENT
	: '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;}
	{printf("Found comment: \%s\n", GETTEXT()->chars);};

WS	: (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};

-----Original Message-----
From: Brian Catlin [mailto:BrianC at sannas.org] 
Sent: Tuesday, March 16, 2010 16:18
To: 'antlr-interest at antlr.org'
Subject: RE: [antlr-interest] Grammar help

(Brian slaps head again), "Duh!"  Sigh.  Sometimes, I really wonder whether
I'm overpaid ;-}

You fixed it!

Thank you very much for your help!!

 -Brian

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Bart Kiers
Sent: Tuesday, March 16, 2010 15:33
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Grammar help

On Tue, Mar 16, 2010 at 8:10 AM, Brian Catlin <BrianC at sannas.org> wrote:

> While that gets rid of those warnings (why don't the warnings print a 
> reasonable line number?  I would call that a BUG),

Note that the '!' is a valid operator inside your grammar, ANTLR just
assumes that you're building trees. So, you're not doing anything wrong.
But, yes, a warning with the line number of the improper use of rewrite
operators would be nice.

 On Tue, Mar 16, 2010 at 8:10 AM, Brian Catlin <BrianC at sannas.org> wrote:

> the fundamental problem
> of being able to parse (or otherwise capture the file name) still exists.
>
> Any ideas?
>

The error message is telling that your FILE_NAME is ambiguous. When matching
one or more characters from:

~('|' | '<' | '>' | '*' | '?')+

then line breaks will also be matched, yet after that, the following could
be matched:

('\r'? '\n')

which has already been "eaten" by the previous part of your rule. You could
fix that by adding line breaks to that first part of your rule, like this:

FILE_NAME    :  ~('|' | '<' | '>' | '*' | '?'| '\r' | '\n')+ (('\r'? '\n') |
EOF);

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address