[antlr-interest] Lexer issues when block ends with EOF instead of EOL

Brent Yates brent.yates at gmail.com
Fri Feb 13 08:41:02 PST 2009


My problem is that in addition to the single line comment which can be
handled nicely the way you specified, I also mix in preprocessor style
patterns (single pass parsing).  These patterns can span multiple lines and
therefore the EOL's are part of them.
fragment PREPROSSOR_BLOCK
    :   (options {greedy = false;} : ('\\\r\n' | '\\\n' | ~'\n' ) )*
        (
            '\n'
        )
    ;

I also have places where I have to explicitly check for white space
(including EOL).

NUMBER_WITH_EXPLICIT_BASE
    :   (DECIMAL_BASE | BINARY_BASE | OCTAL_BASE | HEX_BASE)
        WS_OR_NEWLINE?
        (POST_BASE_NUMBER | MACRO_INSTANTIATION)
    ;

I could fight with the lexer patterns to see if I could find a better
solution, but I'm lazy ;-) .  For the most part my lexer and parser are
stable and working and I have moved on to coding the interesting parts of
the application - actually using the AST data.  I am loath to go back to
fighting the lexer.
I do have issues in the lexer and if I write another (or re-write this one
again) what I learned this time around (including your suggestions) will
result is a much cleaner process.  Right now I am ok with hacking in an
extra EOL to the input stream just to handle the odd malformed file.

Thanks!

Brent Yates

On Fri, Feb 13, 2009 at 11:23 AM, Loring Craymer <lgcraymer at yahoo.com>wrote:

> No--this is a factoring issue.  Unless your parser needs to see EOL tokens,
> they should be in a separate rule and marked as "hidden".  If you take out
> the
> '\r/? '\n' from this rule, I expect that your grammar will work (provided
> that you separately recognize newlines).
>
> --Loring
>
> ------------------------------
> *From:* Brent Yates <brent.yates at gmail.com>
> *To:* "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> *Sent:* Wednesday, February 11, 2009 9:52:23 AM
> *Subject:* [antlr-interest] Lexer issues when block ends with EOF instead
> of EOL
>
> Assuming a standard LINE comment form such as:
> SL_COMMENT
>     : '//'  ( options {greedy=false;} : . )*  '\r'? '\n' {$channel=HIDDEN;}
>     ;
>
> What is the best way to handle a file which ends with a single line comment
> but no EOL
> If I add the EOF to the rule I get the following error:
>
> SL_COMMENT
>     : '//'  ( options {greedy=false;} : . )*  '\r'? ('\n'|EOF)
> {$channel=HIDDEN;}
>     ;
>
> ANTLR Parser Generator  Version 3.1.1
> error(201): SystemVerilogLexer.g:592:43: The following alternatives can
> never be matched: 1
>
> This problem occurs with other rules as well.  Is it expected that files
> which end with no EOL are bad or should the lexer handle it?
>
> Thanks.
>
> Brent Yates
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090213/12176c1d/attachment.html 


More information about the antlr-interest mailing list