[antlr-interest] Getting token extents for grammar rules

Tue May 11 10:22:41 PDT 2004

	For better or worse, suppose I'm building my own parse-tree
	data structure where every grammar rule is of the form:

		someRule
		{
			enterRule( SOME_RULE_ID );
		}
			: /* call of subrules, tokens, etc. */
				{
					leaveRule( SOME_RULE_ID );
				}
			;

	and each lexer rule is of the form:

		SOME_TOKEN
			: 'xxxx'
				{
					giveToken( SOME_TOKEN );
				}
			;

	where giveToken() gives the current token to a class that
	accumulates all tokens parsed.  The enterRule() and leaveRule()
	methods carve up the sequence of tokens such that each rule has
	the extent of tokens comprising it, i.e., token[i]...token[j].
	This works fine... mostly.

	I have a case where the rule is of the form:

		prolog
			: (
				{
					enterRule( PROLOG );
				}
			    ( declare SEMICOLON )+
			    	{
					leaveRule( PROLOG );
				}
			  )?
			;

	where "declare" is another rule of the form:

		declare
			: DECLARE ( d1 | d2 | d3 )
			;

	where d1, etc., are various declare statements in the language.
	The problem is that for some of the declarations, the semicolon
	is included in the extent of tokens and for some it isn't.  My
	guess as to why this is has to do with the lexer doing
	look-ahead in some case and not others.

	Is there any way to get consisten behavior (i.e., the semicolon
	always being included)?

	- Paul

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/