[antlr-interest] Whatever Until EOL

Monty Zukowski monty at codetransform.com
Tue Oct 12 12:42:14 PDT 2004


The lexer doesn't know what the parser rules are, it just knows the 
lexer rules.  So you will either have to build the context into the 
lexer so it knows which is supposed to come next (use "lexer states".)  
Or you may do everything in the lexer and not even have a parser (can 
be cumbersome, but people have done it, especially for FORTRAN)

Monty

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html

On Oct 12, 2004, at 11:48 AM, codeteacher wrote:

>
>
>
> Hi all,
>
> I have the following question: Suppose we have a set of lexer rules as
> follows:
>
> The rule WS and NL are standard white space and new line rules.
> SL_COMMENT and ML_COMMENT are both C/C++ like comments (// and /* */
> respectively).
>
> WHATEVERTILLEOL is basically all characters until the end of line.
> WHATEVERTILLWS is basically all characters until a whitespace has 
> occured.
> WHATEVERTILLCOMMA is basically all characters until comma occured.
> WHATEVERTILLLPAREN is basically all characters until left parenthesis
> occured.
> WHATEVERTILLRPAREN is basically all characters until right parenthesis
> occured.
> ----------------------------
> WS: (' '|'\t'|'\f')+ ; // standard whitespace
>
> NL: ( options {generateAmbigWarnings=false;} : // newlines
> "\r\n"|'\r'|'\n') { newline(); } ;
>
> // C++ style single line comment
> SL_COMMENT: "//" (~('\n'|'\r'))* ('\n'|'\r'('\n')?)?
> {$setType(Token.SKIP); newline();};
>
> ML_COMMENT: "/*" // C-Style multi-line comment
> ( options { generateAmbigWarnings=false; } :
> { LA(2)!='/' }? '*'
> | '\r' '\n' {newline();}
> | '\r' {newline();}
> | '\n' {newline();}
> | ~('*'|'\n'|'\r')
> )*
> "*/" {$setType(Token.SKIP);} ;
>
> WHATEVERTILLEOL: // all characters until end of line occured.
> ( (~('\n'|'\r'|'/'))+
> | '/' (~('/'|'*'))*
> )
> ('\n' {newline();} |'\r'('\n')? {newline();} |"//"|"/*") ;
>
> WHATEVERTILLWS: // all characters until whitespace / newline occured.
> ( (~(' '|'\t'|'\f'|'\n'|'\r'|'/'))+
> | '/' (~('/'|'*'))*
> )
> (' '|'\t'|'\f'
> |'\n' {newline();}
> |'\r'('\n')? {newline();}
> |"//"|"/*") ;
>
> WHATEVERTILLCOMMA: // all characters until comma occured.
> ( (~(','|'/'))+
> | '/' (~('/'|'*'))*
> )
> (',') ;
>
> WHATEVERTILLLPAREN: // all characters until left parenthesis occured.
> ( (~('('|'/'))+
> | '/' (~('/'|'*'))*
> )
> ('(') ;
>
> WHATEVERTILLRPAREN: // all characters until right parenthesis occured.
> ( (~(')'|'/'))+
> | '/' (~('/'|'*'))*
> )
> ')' ;
>
>
> ---------------------
>
>
> The lexer throws lexer ambiguity warnings upon these rules. How can I
> fix this?
>
> These lexer rules are going to be used in the following parser:
>
> compilationUnit: (WS|NL)* header (WS|NL)* content (WS|NL)* footer 
> (WS|NL)*
>
> header: "[" "header" "]" (WS|NL)*
> "keyword1" (WS|NL)* WHATEVERTILLEOL
> (WS|NL)*
> "keyword2" (WS|NL)* WHATEVERTILLEOL
> (WS|NL)*
> ("keyword3" (WS|NL)* WHATEVERTILLEOL)?
> (WS|NL)*
> ("keyword4" (WS|NL)* WHATEVERTILLEOL)? ;
>
> content: "[" "content" "]"
> ((WS|NL)* WHATEVERTILLLPAREN "(" WHATEVERTILLCOMMA ","
> WHATEVERTILLRPAREN ")" WHATEVERTILLEOL)+
>
> footer: "[" "footer" "]"
> ((WS)* WHATEVERTILLLPAREN "(" ("Y"|"N") ")"
> (WHATEVERTILLWS WS)+ (NL)+ )+
>
> --------------------------
>
> The sample file would look as follows:
>
> [header]
>    keyword1   somevalue // this is comment
> keyword2 somvalue someothervalue // comment
> keyword3 1 3$*.4--x27 /* comment
> */ another value for keyword 3
> keyword4 value (^$@11%&
>
> [content]
> this is content*key ( 77^q,8&1nn) AABBN@!(!)))0000
>
> content2 (2177, LLS) pppp1782m
>
> [footer]
> footer number 1 (Y) 11 00 27 7' 8~ 9! 8821*T2 b7*
> footr 2 (N) 0( 88 PP) 7!
>
> ----------------------------
>
> Help would be greatly appreciated. The files are from legacy
> applications with no source code. I'm very tempted to build the parser
> by hand or convert the files manually, but there are so many of them.
>
>
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
>



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list