[antlr-interest] Re: Getting token extents for grammar rules

Mon May 17 13:35:31 PDT 2004

> Another potential problem with this is that I want to give ALL tokens, 
> even those that the
> lexer skips, e.g., comments.  I assume that skipped tokens are never 
> seen by consume().

Well - can you define where skipped tokens belong?  By virtue of them 
being skipped, they are not part of any parse rule.  Consider

line: verbPhrase nounPhrases ;
verbPhrase: VERB (ADVERB)? ;
nounPhrases: ((ADJECTIVE)? NOUN)+ ;

VERB: "run" | "walk" | "sleep" ;
ADVERB: "fast" | "slow" | "soundly" ;
ADJECTIVE: "green" | "big" ;
NOUN: "ball" | "chicken" ;

WS: ( ' ' | '\t' )+ { $setType(SKIP); } ;
NL: ( '\r' ( options{greedy=true;}: '\n' )? | '\n' ) { newline(); 
$setType(SKIP); } ;

Now, when this parses "run ball", to which rule does the space between 
"run" and "ball" belong?  To 'line' or 'verbPhrase' or 'nounPhrases'?  
When parsing "run ball chicken", what about the space between "ball" 
and "chicken"?

To me, the answers aren't that clear.  You are essentially saying that 
white space is significant: not to the parse, but to the structure 
produced by parsing.  So, I would be inclined not to use SKIP tokens at 
all, and at the expense of make the grammar more verbose, make it clear 
where I want the white space to end up:

line: verbPhrase nounPhrases w ;
verbPhrase: VERB (w ADVERB)? w ;
nounPhrases: ((ADJECTIVE w)? NOUN w)+ ;
w: (WS | NL)*;

VERB: "run" | "walk" | "sleep" ;
ADVERB: "fast" | "slow" | "soundly" ;
ADJECTIVE: "green" | "big" ;
NOUN: "ball" | "chicken" ;

WS: ( ' ' | '\t' )+ ;
NL: ( '\r' ( options{greedy=true;}: '\n' )? | '\n' ) { newline(); } ;

There are grammars in use that treat white space this way, rather than 
discarding it at the lexer level.  The grammars in many RFCs are such 
examples (IMAP in particular.)

I can think of several other options that don't require mucking with 
the parser grammar, but they all require some serious overloading of 
methods in your lexer and parser, and some form of back-door 
communication between them.  Further, being generalized, they may not 
end up associating the white space with the rule a human would expect.  
Not that I think such ideas are bad, but without knowing specifics, it 
would be hard to recommend a particular approach.

	- Mark

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/