[antlr-interest] Lexer issues when block ends with EOF instead of EOL

Gavin Lambert antlr at mirality.co.nz
Fri Feb 13 18:55:01 PST 2009


At 05:41 14/02/2009, Brent Yates wrote:
>My problem is that in addition to the single line comment which 
>can be handled nicely the way you specified, I also mix in 
>preprocessor style patterns (single pass parsing).  These 
>patterns can span multiple lines and therefore the EOL's are part 
>of them.
>
>fragment PREPROSSOR_BLOCK
>     :   (options {greedy = false;} : ('\\\r\n' | '\\\n' | ~'\n' 
> ) )*
>         (
>             '\n'
>         )
>     ;

Using the same principles, that could simply choose to not match 
the last newline (since it shouldn't need to):

fragment PREPROCESSOR_BLOCK
   : ('\\' '\r'? '\n'? | ~('\\' | '\r' | '\n'))*
   ;

This will consume the newline and keep going if it is preceded by 
a backslash, but otherwise a newline will exit the loop (and 
backslashes can also occur without having to be followed by 
newlines); presumably you're calling this at the end of another 
lexer rule (hopefully one that matches at least one character, to 
avoid an infinite loop); after that exits, the newline (if any) 
can be matched with the regular newline rule.

Possibly one downside of this is that it will still accept a 
backslash at end of file (not followed by a newline), but that's 
probably something you can report an error for at the parser level 
rather than the lexer level.

>I also have places where I have to explicitly check for white 
>space (including EOL).
>
>NUMBER_WITH_EXPLICIT_BASE
>     :   (DECIMAL_BASE | BINARY_BASE | OCTAL_BASE | HEX_BASE)
>         WS_OR_NEWLINE?
>         (POST_BASE_NUMBER | MACRO_INSTANTIATION)
>     ;

I don't see why that one would cause any EOF/missing-newline 
related problems.



More information about the antlr-interest mailing list