[antlr-interest] Trouble parsing a language where '{' has too many meanings

Richard Clark rdclark at gmail.com
Fri Jul 6 15:42:38 PDT 2007


Try changing the definition for ML_TEXT to put the closing element in
a single string.

ML_TEXT
   :    '{'
       ( options {greedy=false;} : . )*
       '}.'
   ;

The lexer doesn't do backtracking, so under the old definition it
would see  {...} and match it before seeing the "." Automatic error
recovery would throw awayy the dot as unrecognized (and give an
error.)

Pulling the closing bracket and dot together '}.' means they'll be
recognized as a unit.

Run the following in ANTLRWorks' debugger to see it working:

grammar multiBlock;

top	: (block | comment)* ;

comment	: ML_TEXT;

block	: BLOCK ;

ML_TEXT
   :    '{'
       ( options {greedy=false;} : . )*
       '}.'
   ;

 BLOCK	: '{' ('A'..'Z'|'a'..'z'|' ')* '}' ;


 ...Richard

P.S. Remember that the first rule to match in the lexer wins.


More information about the antlr-interest mailing list