[antlr-interest] Similar lexer rules

Tue Apr 29 03:41:56 PDT 2008

This simple grammar (a reduction of an actual grammar)

  grammar bug;

  quote: name S? '=' S? '<' text '>';
  name: ID;
  text: (LETTER|S)+;

  //RAW_TEXT: (LETTER|S)+;
  ID: LETTER+;
  fragment LETTER: ('a'..'z'|'A'..'Z');
  S: (' '|'\n'|'\t')+;

will not parse "xx=< yy >" because "yy" will be matched by token ID, so the
grammar rule 'text' will not be accepted. Is there a way to solve this?

I am working on an XML-related grammar where many tokens differs only for
the inclusion or exclusion of a certain range of characters. For example,
in the original grammar the RAW_TEXT rule shadowed the S rule so all the
chars that had to match "S?" ended up matching RAW_TEXT instead. This
forced me to turn RAW_TEXT into a parser rule. Is converting lexer rules
into parser rules the only feasible solution?

-- 
Gioele Barabucci <barabucc at cs.unibo.it>