[antlr-interest] Lists. Lexer or Parser?
Gavin Lambert
antlr at mirality.co.nz
Sat Sep 13 15:03:27 PDT 2008
At 01:00 14/09/2008, Dave Pawson wrote:
>CONTENT: ~(NEWLINE)+;
[...]
>line: (c=CONTENT NEWLINE ) {
> System.out.println("<para>"+ $c.text +"</para>\n"
);}|
> STAR c=CONTENT+ NEWLINE+ {
> System.out.println("<list>"+ $c.text );} ;
[...]
>The output is
><para>content only</para>
>
><para>* LIST list content</para>
>
><para>* LIST list content more</para>
You'll note that "<list>" doesn't appear in the output -- that's a
sign that you're never hitting the second alt, which suggests that
the STAR is getting absorbed by the CONTENT rule. Try changing
CONTENT to this:
CONTENT: ~(STAR | NEWLINE) (~NEWLINE)*;
Another option would be to do all the matching in the lexer:
NEWLINE : '\r' | '\n' { $channel = HIDDEN; };
LISTITEM : '*' (~NEWLINE)* { setText(getText().substr(1)); };
TEXT : ~('*' | NEWLINE) (~NEWLINE)*;
line : TEXT { System.out.println("<para>" + $TEXT.text +
"</para>"); }
| LISTITEM { System.out.println("<item>" + $LISTITEM.text +
"</item>"); }
;
It wouldn't be hard from there to generate a surrounding "<list>"
element for groupings of LISTITEMs:
line : TEXT { System.out.println("<para>" + $TEXT.text +
"</para>"); }
| list
;
list : (LISTITEM) => { System.out.println("<list>"); }
(LISTITEM { System.out.println("<item>" + $LISTITEM.text
+ "</item>"); })+
{ System.out.println("</list>"); }
;
(You probably don't even need the predicate there, since ANTLR
shouldn't try to enter the list rule unless there's a LISTITEM
present anyway. But it never hurts to be paranoid.)
More information about the antlr-interest
mailing list