[antlr-interest] Whatever Until EOL

Tue Oct 12 11:48:49 PDT 2004

Hi all,

I have the following question: Suppose we have a set of lexer rules as
follows:

The rule WS and NL are standard white space and new line rules.
SL_COMMENT and ML_COMMENT are both C/C++ like comments (// and /* */
respectively).

WHATEVERTILLEOL is basically all characters until the end of line.
WHATEVERTILLWS is basically all characters until a whitespace has occured.
WHATEVERTILLCOMMA is basically all characters until comma occured.
WHATEVERTILLLPAREN is basically all characters until left parenthesis
occured.
WHATEVERTILLRPAREN is basically all characters until right parenthesis
occured.
----------------------------
WS: (' '|'\t'|'\f')+ ; // standard whitespace

NL: ( options {generateAmbigWarnings=false;} : // newlines
"\r\n"|'\r'|'\n') { newline(); } ;

// C++ style single line comment
SL_COMMENT: "//" (~('\n'|'\r'))* ('\n'|'\r'('\n')?)?
{$setType(Token.SKIP); newline();};

ML_COMMENT: "/*" // C-Style multi-line comment
( options { generateAmbigWarnings=false; } :
{ LA(2)!='/' }? '*'
| '\r' '\n' {newline();}
| '\r' {newline();}
| '\n' {newline();}
| ~('*'|'\n'|'\r')
)*
"*/" {$setType(Token.SKIP);} ;

WHATEVERTILLEOL: // all characters until end of line occured.
( (~('\n'|'\r'|'/'))+
| '/' (~('/'|'*'))*
)
('\n' {newline();} |'\r'('\n')? {newline();} |"//"|"/*") ;

WHATEVERTILLWS: // all characters until whitespace / newline occured.
( (~(' '|'\t'|'\f'|'\n'|'\r'|'/'))+
| '/' (~('/'|'*'))*
)
(' '|'\t'|'\f'
|'\n' {newline();}
|'\r'('\n')? {newline();}
|"//"|"/*") ;

WHATEVERTILLCOMMA: // all characters until comma occured.
( (~(','|'/'))+
| '/' (~('/'|'*'))*
)
(',') ;

WHATEVERTILLLPAREN: // all characters until left parenthesis occured.
( (~('('|'/'))+
| '/' (~('/'|'*'))*
)
('(') ;

WHATEVERTILLRPAREN: // all characters until right parenthesis occured.
( (~(')'|'/'))+
| '/' (~('/'|'*'))*
)
')' ;

---------------------

The lexer throws lexer ambiguity warnings upon these rules. How can I
fix this?

These lexer rules are going to be used in the following parser:

compilationUnit: (WS|NL)* header (WS|NL)* content (WS|NL)* footer (WS|NL)*

header: "[" "header" "]" (WS|NL)*
"keyword1" (WS|NL)* WHATEVERTILLEOL
(WS|NL)*
"keyword2" (WS|NL)* WHATEVERTILLEOL
(WS|NL)*
("keyword3" (WS|NL)* WHATEVERTILLEOL)?
(WS|NL)*
("keyword4" (WS|NL)* WHATEVERTILLEOL)? ;

content: "[" "content" "]"
((WS|NL)* WHATEVERTILLLPAREN "(" WHATEVERTILLCOMMA ","
WHATEVERTILLRPAREN ")" WHATEVERTILLEOL)+

footer: "[" "footer" "]"
((WS)* WHATEVERTILLLPAREN "(" ("Y"|"N") ")"
(WHATEVERTILLWS WS)+ (NL)+ )+

--------------------------

The sample file would look as follows:

[header]
   keyword1   somevalue // this is comment
keyword2 somvalue someothervalue // comment
keyword3 1 3$*.4--x27 /* comment 
*/ another value for keyword 3
keyword4 value (^$@11%&

[content]
this is content*key ( 77^q,8&1nn) AABBN@!(!)))0000

content2 (2177, LLS) pppp1782m

[footer]
footer number 1 (Y) 11 00 27 7' 8~ 9! 8821*T2 b7*
footr 2 (N) 0( 88 PP) 7!

----------------------------

Help would be greatly appreciated. The files are from legacy
applications with no source code. I'm very tempted to build the parser
by hand or convert the files manually, but there are so many of them.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/