[antlr-interest] Newbie! how can I convert a list of bullets to anHTML list

Fri Jun 3 06:54:04 PDT 2005

Matthew,

Thanks for your reply. I'll try adding a predicate, as you suggest. I
actually don't have any problem finding a list in the lexer. But, I
guess, in the parser, I somehow have to know that one list token is the
first or last of a sequence, which, from the docs, sounded like a
context-sensitive grammar, like:

para list -> list_begin list_item

list list -> list_item

list para -> list_item list_end

Does that make sense to you?

A list is actually the character sequence:

\n

-\tLorem ipsum\n

-\tDolor sit\n

-\tAmet\n

\n

-\sFoo bar\n

-\sBar foo\n

-\sFoo\n

I haven't attempted it yet, but I also need to support a char sequence
like

\n

1.\tLorem ipsum\n

2.\tDolor sit\n

2.1.\tAmet\n

2.2.\tConsectetuer Amet\n

making a nested HTML ordered list <ol><li><ol>Consectetuer
Amet</ol></li></ol>.

Hence my earlier point about nested lists.

________________________________

From: Matthew Ford [mailto:matthew.ford at forward.com.au] 
Sent: 02 June 2005 23:02
To: Matthew Pearce; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list

Is the list actually the character sequence

/n

/t-/tbullet/n

/t-/tbullet/n

/t-/tbullet/n

/t-/tbullet/n

What makes a list different from other text like /t-/t

matthew

You may need to do infinite lookahead to decided you are processing a
list

 like 

(list) => list

see Syntactic Predicates in the docs

matthew

	----- Original Message ----- 

	From: Matthew Pearce <mailto:mpearce at digitas.com>  

	To: antlr-interest at antlr.org 

	Sent: Friday, June 03, 2005 1:19 AM

	Subject: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list

	I'd like to convert a list of bullets to an HTML list, i.e.:

	From:

	-          bullet

	-          bullet

	-          bullet

	To:

	<ul><li>bullet</li><li>bullet</li><li>bullet</li></ul>

	I thought over a few different options:

	1. Have the lexer produce a LIST token when it matches:

	 - bullet

	But I don't know how to get the parser to find the <ul> tags,
because I cannot add a special case

	2. Have the lexer produce a LIST token when it matches:

	-          bullet

	-          bullet

	-          bullet

	But I don't know how to get the parser to insert the <li> tags,
because it hasn't tokenized each bullet

	3. Have the parser match a rule for list that matches like:

	list:       LIST^  PARA (LIST! PARA)+

	Which would give me an AST node like, that could support nested
lists.

	                        LIST ----+----PARA

	                                    +----PARA

	                                    +----LIST--------+-PARA

	                                     +---PARA         

	But this gives me non-determinisim, between match a straight
paragraph (PARA), and a bulleted line LIST PARA.

	Can anyone suggest an approach?  

	class CourseTreeWalker extends TreeParser;

	tree2html returns [String s]

	{ s = ""; }

	    :

	      (#(t:TTL (p:PARA | l:list)+ { 

	            s+="<h4>" +t+ "</h4>\n";

	            s+= "<p>" +p+ "</p>\n";

	            s+= "<ul>"+l+"</ul>"; } ))+   // this doesn't do
what I want

	    ;

	list        // this doesn't do what I want

	{ String l = ""; }

	 :

	      (#(LIST (p2:PARA) { 

	            l+="<ul><li>" +p2+ "</li></ul>\n";

	             } ))

	;

	class CourseParser extends Parser;

	options {

	    buildAST = true;

	}

	file :  (section)+ EOF! ;

	section : TTL^ (listexpr)+;

	listexpr : (LIST^)? paraexpr;   // this just matches each
bullet, instead of treating bullets as a group

	paraexpr: (PARA);

	class CourseLexer extends Lexer;

	options {

	    k = 3; 

	    charVocabulary = '\3'..'\377';

	}

	PARA  : ("LZU") =>

	        ("LZU" (LETTER | DIGIT | ' ' | '/')+)  { $setType(TTL);
}

	        |

	        ("Des") =>

	        ("Description:")   { $setType(TTL); }

	        |

	        ("Lea") =>

	        ("Learning objectives:")   { $setType(TTL); }

	        |

	        ("Tar") =>

	        ("Target audience:")   { $setType(TTL); }

	        |

	        ("Pre") =>

	        ("Prerequisites:")   { $setType(TTL); }

	        |

	         (CHAR | ' ' )+ 

	      ;

	LIST   : ('-' | '*') ;

	NEWLINE : (

	                  ('\r''\n')=> '\r''\n' //DOS

	                  | '\r' //MAC 

	                  | '\n' //UNIX

	                  )

	                  { $setType(Token.SKIP); newline();  }

	            ;

	protected

	DIGIT

	      : '0'..'9'

	      ;

	protected

	LETTER

	      : ('a'..'z' | 'A'..'Z')

	      ;

	protected

	CHAR

	      : ~( '\n' | '\r' | ' ' | '\t' | '\f' | '-' | '*' )

	      ;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050603/a2c99202/attachment-0001.html