[antlr-interest] Newbie! how can I convert a list of bullets to an HTML list

Thu Jun 2 08:19:45 PDT 2005

I'd like to convert a list of bullets to an HTML list, i.e.:

From:

-          bullet

-          bullet

-          bullet

To:

<ul><li>bullet</li><li>bullet</li><li>bullet</li></ul>

I thought over a few different options:

1. Have the lexer produce a LIST token when it matches:

 - bullet

But I don't know how to get the parser to find the <ul> tags, because I
cannot add a special case

2. Have the lexer produce a LIST token when it matches:

-          bullet

-          bullet

-          bullet

But I don't know how to get the parser to insert the <li> tags, because
it hasn't tokenized each bullet

3. Have the parser match a rule for list that matches like:

list:       LIST^  PARA (LIST! PARA)+

Which would give me an AST node like, that could support nested lists.

                        LIST ----+----PARA

                                    +----PARA

                                    +----LIST--------+-PARA

                                     +---PARA         

But this gives me non-determinisim, between match a straight paragraph
(PARA), and a bulleted line LIST PARA.

Can anyone suggest an approach?  

class CourseTreeWalker extends TreeParser;

tree2html returns [String s]

{ s = ""; }

    :

      (#(t:TTL (p:PARA | l:list)+ { 

            s+="<h4>" +t+ "</h4>\n";

            s+= "<p>" +p+ "</p>\n";

            s+= "<ul>"+l+"</ul>"; } ))+   // this doesn't do what I want

    ;

list        // this doesn't do what I want

{ String l = ""; }

 :

      (#(LIST (p2:PARA) { 

            l+="<ul><li>" +p2+ "</li></ul>\n";

             } ))

;

class CourseParser extends Parser;

options {

    buildAST = true;

}

file :  (section)+ EOF! ;

section : TTL^ (listexpr)+;

listexpr : (LIST^)? paraexpr;   // this just matches each bullet,
instead of treating bullets as a group

paraexpr: (PARA);

class CourseLexer extends Lexer;

options {

    k = 3; 

    charVocabulary = '\3'..'\377';

}

PARA  : ("LZU") =>

        ("LZU" (LETTER | DIGIT | ' ' | '/')+)  { $setType(TTL); }

        |

        ("Des") =>

        ("Description:")   { $setType(TTL); }

        |

        ("Lea") =>

        ("Learning objectives:")   { $setType(TTL); }

        |

        ("Tar") =>

        ("Target audience:")   { $setType(TTL); }

        |

        ("Pre") =>

        ("Prerequisites:")   { $setType(TTL); }

        |

         (CHAR | ' ' )+ 

      ;

LIST   : ('-' | '*') ;

NEWLINE : (

                  ('\r''\n')=> '\r''\n' //DOS

                  | '\r' //MAC 

                  | '\n' //UNIX

                  )

                  { $setType(Token.SKIP); newline();  }

            ;

protected

DIGIT

      : '0'..'9'

      ;

protected

LETTER

      : ('a'..'z' | 'A'..'Z')

      ;

protected

CHAR

      : ~( '\n' | '\r' | ' ' | '\t' | '\f' | '-' | '*' )

      ;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050602/2dccd7e0/attachment-0001.html