[antlr-interest] Getting token extents for grammar rules
Paul J. Lucas
dude at darkfigure.org
Tue May 11 10:22:41 PDT 2004
For better or worse, suppose I'm building my own parse-tree
data structure where every grammar rule is of the form:
someRule
{
enterRule( SOME_RULE_ID );
}
: /* call of subrules, tokens, etc. */
{
leaveRule( SOME_RULE_ID );
}
;
and each lexer rule is of the form:
SOME_TOKEN
: 'xxxx'
{
giveToken( SOME_TOKEN );
}
;
where giveToken() gives the current token to a class that
accumulates all tokens parsed. The enterRule() and leaveRule()
methods carve up the sequence of tokens such that each rule has
the extent of tokens comprising it, i.e., token[i]...token[j].
This works fine... mostly.
I have a case where the rule is of the form:
prolog
: (
{
enterRule( PROLOG );
}
( declare SEMICOLON )+
{
leaveRule( PROLOG );
}
)?
;
where "declare" is another rule of the form:
declare
: DECLARE ( d1 | d2 | d3 )
;
where d1, etc., are various declare statements in the language.
The problem is that for some of the declarations, the semicolon
is included in the extent of tokens and for some it isn't. My
guess as to why this is has to do with the lexer doing
look-ahead in some case and not others.
Is there any way to get consisten behavior (i.e., the semicolon
always being included)?
- Paul
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list