[antlr-interest] Recovering white space in V3.0
Matthew Ford
matthew.ford at forward.com.au
Sat Jun 4 13:59:20 PDT 2005
This is what I have so far.
WS is ignored => channel 99
but between WORDs I want to get it back
So I have used
(
w=WORD
{ if (wordsStarted) {
// output all ignored tokens between lastIndex and this index
for (int i=lastIndex+1; i<w.getTokenIndex(); i++) {
System.out.print(input.get(i).getText());
}
} else {
wordsStarted = true;
}
System.out.print(w.getText());
lastIndex = w.getTokenIndex();
}
)*
Is there a better way?
matthew
========================================================
grammar Lists;
start
: (paraOrList)*
;
paraOrList
: para
| para {System.out.println("<ol>");} (list)+
{System.out.println("</ol>");}
;
list
init {
boolean wordsStarted = false;
int lastIndex = 0;
}
:
{System.out.print("<li>");}
MINUS
(
w=WORD
{ if (wordsStarted) {
// output all ignored tokens between lastIndex and this index
for (int i=lastIndex+1; i<w.getTokenIndex(); i++) {
System.out.print(input.get(i).getText());
}
} else {
wordsStarted = true;
}
System.out.print(w.getText());
lastIndex = w.getTokenIndex();
}
)*
{System.out.println("</li>");}
NL
;
para
: NL NL
;
WORD : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
MINUS :
'-'
;
NL : '\n'
;
WS : ( ' '
| '\t'
| '\r'
)+
{ channel=99; }
;
Input ================================================
- Lorem ipsum
- Dolor sit
- Amet
- Foo bar
- Bar foo
- Foo
output =====================================
<ol>
<li>Lorem ipsum</li>
<li>Dolor sit</li>
<li>Amet</li>
</ol>
<ol>
<li>Foo bar</li>
<li>Bar foo</li>
<li>Foo</li>
</ol>
More information about the antlr-interest
mailing list