[antlr-interest] Lists. Lexer or Parser?

Gavin Lambert antlr at mirality.co.nz
Sat Sep 13 15:03:27 PDT 2008


At 01:00 14/09/2008, Dave Pawson wrote:
 >CONTENT: ~(NEWLINE)+;
[...]
 >line:  (c=CONTENT NEWLINE ) {
 >            System.out.println("<para>"+ $c.text +"</para>\n" 
);}|
 >     STAR c=CONTENT+ NEWLINE+ {
 >            System.out.println("<list>"+ $c.text );}   ;
[...]
 >The output is
 ><para>content only</para>
 >
 ><para>* LIST list content</para>
 >
 ><para>* LIST list content more</para>

You'll note that "<list>" doesn't appear in the output -- that's a 
sign that you're never hitting the second alt, which suggests that 
the STAR is getting absorbed by the CONTENT rule.  Try changing 
CONTENT to this:

CONTENT: ~(STAR | NEWLINE) (~NEWLINE)*;


Another option would be to do all the matching in the lexer:

NEWLINE : '\r' | '\n' { $channel = HIDDEN; };
LISTITEM : '*' (~NEWLINE)* { setText(getText().substr(1)); };
TEXT : ~('*' | NEWLINE) (~NEWLINE)*;

line : TEXT { System.out.println("<para>" + $TEXT.text + 
"</para>"); }
      | LISTITEM { System.out.println("<item>" + $LISTITEM.text + 
"</item>"); }
      ;

It wouldn't be hard from there to generate a surrounding "<list>" 
element for groupings of LISTITEMs:

line : TEXT { System.out.println("<para>" + $TEXT.text + 
"</para>"); }
      | list
      ;

list : (LISTITEM) => { System.out.println("<list>"); }
          (LISTITEM { System.out.println("<item>" + $LISTITEM.text 
+ "</item>"); })+
        { System.out.println("</list>"); }
      ;

(You probably don't even need the predicate there, since ANTLR 
shouldn't try to enter the list rule unless there's a LISTITEM 
present anyway.  But it never hurts to be paranoid.)



More information about the antlr-interest mailing list