[antlr-interest] Recovering white space in V3.0

Matthew Ford matthew.ford at forward.com.au
Sat Jun 4 13:59:20 PDT 2005


This is what I have so far.
WS is ignored  => channel 99
but between WORDs I want to get it back
So I have used
    (
    w=WORD
      { if (wordsStarted) {
        // output all ignored tokens between lastIndex and this index
         for (int i=lastIndex+1; i<w.getTokenIndex(); i++) {
          System.out.print(input.get(i).getText());
         }
        } else {
          wordsStarted = true;
        }
        System.out.print(w.getText());
        lastIndex = w.getTokenIndex();
      }
  )*


Is there a better way?
matthew

========================================================

grammar Lists;

start
  : (paraOrList)*
  ;

paraOrList
  : para
  | para {System.out.println("<ol>");} (list)+
{System.out.println("</ol>");}
  ;

list
init {
 boolean wordsStarted = false;
 int lastIndex = 0;
 }
 :
   {System.out.print("<li>");}
   MINUS
    (
    w=WORD
      { if (wordsStarted) {
        // output all ignored tokens between lastIndex and this index
         for (int i=lastIndex+1; i<w.getTokenIndex(); i++) {
          System.out.print(input.get(i).getText());
         }
        } else {
          wordsStarted = true;
        }
        System.out.print(w.getText());
        lastIndex = w.getTokenIndex();
      }
  )*
    {System.out.println("</li>");}
  NL
  ;

para
  : NL NL
  ;


WORD  :   ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


MINUS :
    '-'
    ;

NL  : '\n'
    ;

WS  :   (   ' '
        |   '\t'
        |   '\r'
        )+
        { channel=99; }
    ;


Input ================================================


- Lorem ipsum
- Dolor sit
- Amet


- Foo bar
- Bar foo
- Foo


output =====================================
<ol>
<li>Lorem ipsum</li>
<li>Dolor sit</li>
<li>Amet</li>
</ol>
<ol>
<li>Foo bar</li>
<li>Bar foo</li>
<li>Foo</li>
</ol>



More information about the antlr-interest mailing list