[antlr-interest] Request for Change regarding Lexer (?)

Terence Parr parrt at cs.usfca.edu
Tue Feb 16 10:32:19 PST 2010


http://www.antlr.org/jira/browse/ANTLR-189
Ter
On Feb 16, 2010, at 10:29 AM, Marcin Rzeźnicki wrote:

> Hi all,
> Recently I came upon the following problem in ANTLR Lexer:
> 
> My initial grammar:
> fragment
> VerbatimString
>  :
>  (
>    '[' LineSequence ']'
>  )
>  |
>  (
>    '{' LineSequence '}'
>  )
>  ;
> 
> fragment
> LineSequence
>   :
> NewLine
>    (
>      ~(
>        '\r'
>        | '\n'
>       )+
>      NewLine
>    )*
> 
> The intention was to match strings starting with some separator ('[' or '{')
> followed by new line, extending to the closing separator preceded by new
> line. Of course, it isn't exactly correct, as you probably spotted. Problem
> here was that LineSequence would gladly consume closing separator after the
> new line. So, it seemed logical to make loop in LineSequence non-greedy so
> that it examines what follows and leaves as soon as closing operator appears
> in lookahead. I tried:
> fragment
> LineSequence
>   :
> NewLine
>    ( options {greedy=false;}:
>      ~(
>        '\r'
>        | '\n'
>       )+
>      NewLine
>    )*
> ;
> 
> and it occurred to me that ANTLR does not really inspect follow set, it
> seems to inspect only what's left in the rule itself. Therefore, for this
> scheme to work I had to write the following ugliness
> :
> fragment
> VerbatimString
>  :
>  (
>    '[' NewLine
>    ( options {greedy=false;}:
>      ~(
>        '\r'
>        | '\n'
>       )+
>      NewLine
>    )*
>    ']'
>  )
>  |
>  (
>    '{' NewLine
>    ( options {greedy=false;}:
>      ~(
>        '\r'
>        | '\n'
>       )+
>      NewLine
>    )*
>    '}'
>  )
>  ;
> 
> which seems to be more or less working as expected. Now, if anyone knows of
> better way, please let me know. But, assuming that I did not screw up
> anything here, I'd really like to see the way of giving some hints to ANTLR
> without writing messy grammars. I thought of syntax:
> fragment
> VerbatimString
>  :
>  (
>    '[' < LineSequence; ']' > ']'
>  )
>  |
>  (
>    '{' < LineSequence; '}' > '}'
>  )
>  ;
> where one could specify what is expected to follow. Is that feasible? Thanks
> in advance for your comments/thoughts
> 
> -- 
> Greetings
> Marcin Rzeźnicki
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list