[antlr-interest] Whatever Until EOL

Tue Oct 12 13:43:36 PDT 2004

The problem is that the various lexer patterns all have the same left match.
For example, the string "aabbcc" can be matched by several rules (e.g.,
WHATEVERTILLEOL and WHATEVERTILLWS).

One option is to pull the preceding token you use in the parser into the
lexer - for example, make "(" be the first character of the next token.  For
example:

	content:
		"[" "content" "]"
		((WS|NL)* WHATEVERTILLLPAREN "(" WHATEVERTILLCOMMA ","
		WHATEVERTILLRPAREN ")" WHATEVERTILLEOL)+

Change this by moving stuff to the lexer:
	protected CONTENT: "[content]";
	protected RP: ')';
	protected COMMA: ',';
	protected LP: '('

	CONTENT_LEADER : CONTENT! (~',')* ;
	FIRSTARG : LP! ( ~COMMA )* ;
	OTHERARGS : COMMA! (~RP)* ;
	TRAILING : RP! (~EOL)* ;
The ambiguity is gone because '(' guarantees a FIRSTARG; ',' guarantees
OTHERARGS, etc.

Now the parser looks like this
	content: CONTENT_LEADER ( FIRSTARG OTHERARGS TRAILING )+ ;

For dealing with the WS|NL stuff, I think that the ignore option to the
lexer will meet your needs.

Hope this helps,
- Bryan Ewbank

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/