[antlr-interest] Request for Change regarding Lexer (?)

Marcin Rzeźnicki marcin.rzeznicki at gmail.com
Tue Feb 16 10:29:44 PST 2010


Hi all,
Recently I came upon the following problem in ANTLR Lexer:

My initial grammar:
fragment
VerbatimString
  :
  (
    '[' LineSequence ']'
  )
  |
  (
    '{' LineSequence '}'
  )
  ;

fragment
LineSequence
   :
NewLine
    (
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*

The intention was to match strings starting with some separator ('[' or '{')
followed by new line, extending to the closing separator preceded by new
line. Of course, it isn't exactly correct, as you probably spotted. Problem
here was that LineSequence would gladly consume closing separator after the
new line. So, it seemed logical to make loop in LineSequence non-greedy so
that it examines what follows and leaves as soon as closing operator appears
in lookahead. I tried:
fragment
LineSequence
   :
NewLine
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*
;

and it occurred to me that ANTLR does not really inspect follow set, it
seems to inspect only what's left in the rule itself. Therefore, for this
scheme to work I had to write the following ugliness
:
fragment
VerbatimString
  :
  (
    '[' NewLine
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*
    ']'
  )
  |
  (
    '{' NewLine
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*
    '}'
  )
  ;

which seems to be more or less working as expected. Now, if anyone knows of
better way, please let me know. But, assuming that I did not screw up
anything here, I'd really like to see the way of giving some hints to ANTLR
without writing messy grammars. I thought of syntax:
fragment
VerbatimString
  :
  (
    '[' < LineSequence; ']' > ']'
  )
  |
  (
    '{' < LineSequence; '}' > '}'
  )
  ;
where one could specify what is expected to follow. Is that feasible? Thanks
in advance for your comments/thoughts

-- 
Greetings
Marcin Rzeźnicki


More information about the antlr-interest mailing list