[antlr-interest] Lexer issues when block ends with EOF instead of EOL
Gavin Lambert
antlr at mirality.co.nz
Fri Feb 13 18:55:01 PST 2009
At 05:41 14/02/2009, Brent Yates wrote:
>My problem is that in addition to the single line comment which
>can be handled nicely the way you specified, I also mix in
>preprocessor style patterns (single pass parsing). These
>patterns can span multiple lines and therefore the EOL's are part
>of them.
>
>fragment PREPROSSOR_BLOCK
> : (options {greedy = false;} : ('\\\r\n' | '\\\n' | ~'\n'
> ) )*
> (
> '\n'
> )
> ;
Using the same principles, that could simply choose to not match
the last newline (since it shouldn't need to):
fragment PREPROCESSOR_BLOCK
: ('\\' '\r'? '\n'? | ~('\\' | '\r' | '\n'))*
;
This will consume the newline and keep going if it is preceded by
a backslash, but otherwise a newline will exit the loop (and
backslashes can also occur without having to be followed by
newlines); presumably you're calling this at the end of another
lexer rule (hopefully one that matches at least one character, to
avoid an infinite loop); after that exits, the newline (if any)
can be matched with the regular newline rule.
Possibly one downside of this is that it will still accept a
backslash at end of file (not followed by a newline), but that's
probably something you can report an error for at the parser level
rather than the lexer level.
>I also have places where I have to explicitly check for white
>space (including EOL).
>
>NUMBER_WITH_EXPLICIT_BASE
> : (DECIMAL_BASE | BINARY_BASE | OCTAL_BASE | HEX_BASE)
> WS_OR_NEWLINE?
> (POST_BASE_NUMBER | MACRO_INSTANTIATION)
> ;
I don't see why that one would cause any EOF/missing-newline
related problems.
More information about the antlr-interest
mailing list