[antlr-interest] Newbie! how can I convert a list of bullets to
	anHTML list
    Matthew Pearce 
    mpearce at digitas.com
       
    Fri Jun  3 06:54:04 PDT 2005
    
    
  
Matthew,
 
Thanks for your reply. I'll try adding a predicate, as you suggest. I
actually don't have any problem finding a list in the lexer. But, I
guess, in the parser, I somehow have to know that one list token is the
first or last of a sequence, which, from the docs, sounded like a
context-sensitive grammar, like:
 
para list -> list_begin list_item
list list -> list_item
list para -> list_item list_end
 
Does that make sense to you?
 
A list is actually the character sequence:
 
\n
-\tLorem ipsum\n
-\tDolor sit\n
-\tAmet\n
 
\n
-\sFoo bar\n
-\sBar foo\n
-\sFoo\n
 
I haven't attempted it yet, but I also need to support a char sequence
like
 
\n
1.\tLorem ipsum\n
2.\tDolor sit\n
2.1.\tAmet\n
2.2.\tConsectetuer Amet\n
 
making a nested HTML ordered list <ol><li><ol>Consectetuer
Amet</ol></li></ol>.
 
Hence my earlier point about nested lists.
 
 
________________________________
From: Matthew Ford [mailto:matthew.ford at forward.com.au] 
Sent: 02 June 2005 23:02
To: Matthew Pearce; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list
 
Is the list actually the character sequence
/n
/t-/tbullet/n
/t-/tbullet/n
/t-/tbullet/n
/t-/tbullet/n
 
What makes a list different from other text like /t-/t
matthew
 
You may need to do infinite lookahead to decided you are processing a
list
 like 
(list) => list
see Syntactic Predicates in the docs
matthew
	----- Original Message ----- 
	From: Matthew Pearce <mailto:mpearce at digitas.com>  
	To: antlr-interest at antlr.org 
	Sent: Friday, June 03, 2005 1:19 AM
	Subject: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list
	 
	I'd like to convert a list of bullets to an HTML list, i.e.:
	 
	From:
	-          bullet
	-          bullet
	-          bullet
	 
	To:
	<ul><li>bullet</li><li>bullet</li><li>bullet</li></ul>
	 
	I thought over a few different options:
	 
	1. Have the lexer produce a LIST token when it matches:
	 - bullet
	But I don't know how to get the parser to find the <ul> tags,
because I cannot add a special case
	 
	2. Have the lexer produce a LIST token when it matches:
	-          bullet
	-          bullet
	-          bullet
	But I don't know how to get the parser to insert the <li> tags,
because it hasn't tokenized each bullet
	 
	3. Have the parser match a rule for list that matches like:
	 
	list:       LIST^  PARA (LIST! PARA)+
	 
	Which would give me an AST node like, that could support nested
lists.
	 
	                        LIST ----+----PARA
	                                    +----PARA
	                                    +----LIST--------+-PARA
	                                     +---PARA         
	 
	But this gives me non-determinisim, between match a straight
paragraph (PARA), and a bulleted line LIST PARA.
	 
	 
	Can anyone suggest an approach?  
	 
	 
	class CourseTreeWalker extends TreeParser;
	 
	tree2html returns [String s]
	{ s = ""; }
	    :
	      (#(t:TTL (p:PARA | l:list)+ { 
	            s+="<h4>" +t+ "</h4>\n";
	            s+= "<p>" +p+ "</p>\n";
	            s+= "<ul>"+l+"</ul>"; } ))+   // this doesn't do
what I want
	      
	    ;
	 
	list        // this doesn't do what I want
	{ String l = ""; }
	 :
	      (#(LIST (p2:PARA) { 
	            l+="<ul><li>" +p2+ "</li></ul>\n";
	             } ))
	;
	 
	class CourseParser extends Parser;
	 
	options {
	    buildAST = true;
	}
	 
	file :  (section)+ EOF! ;
	 
	section : TTL^ (listexpr)+;
	 
	listexpr : (LIST^)? paraexpr;   // this just matches each
bullet, instead of treating bullets as a group
	 
	paraexpr: (PARA);
	 
	 
	class CourseLexer extends Lexer;
	 
	options {
	    k = 3; 
	    charVocabulary = '\3'..'\377';
	}
	 
	 
	PARA  : ("LZU") =>
	        ("LZU" (LETTER | DIGIT | ' ' | '/')+)  { $setType(TTL);
}
	        |
	        ("Des") =>
	        ("Description:")   { $setType(TTL); }
	        |
	        ("Lea") =>
	        ("Learning objectives:")   { $setType(TTL); }
	        |
	        ("Tar") =>
	        ("Target audience:")   { $setType(TTL); }
	        |
	        ("Pre") =>
	        ("Prerequisites:")   { $setType(TTL); }
	        |
	         (CHAR | ' ' )+ 
	      ;
	 
	 
	LIST   : ('-' | '*') ;
	 
	 
	 
	NEWLINE : (
	                  ('\r''\n')=> '\r''\n' //DOS
	                  
	                  | '\r' //MAC 
	                  
	                  | '\n' //UNIX
	                  )
	                  { $setType(Token.SKIP); newline();  }
	            ;
	protected
	DIGIT
	      : '0'..'9'
	      ;
	 
	protected
	LETTER
	      : ('a'..'z' | 'A'..'Z')
	      ;
	 
	            
	protected
	CHAR
	      : ~( '\n' | '\r' | ' ' | '\t' | '\f' | '-' | '*' )
	      ;
	      
	    
	    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050603/a2c99202/attachment-0001.html
    
    
More information about the antlr-interest
mailing list