[antlr-interest] Newbie! how can I convert a list of bullets to
anHTML list
Matthew Pearce
mpearce at digitas.com
Fri Jun 3 06:54:04 PDT 2005
Matthew,
Thanks for your reply. I'll try adding a predicate, as you suggest. I
actually don't have any problem finding a list in the lexer. But, I
guess, in the parser, I somehow have to know that one list token is the
first or last of a sequence, which, from the docs, sounded like a
context-sensitive grammar, like:
para list -> list_begin list_item
list list -> list_item
list para -> list_item list_end
Does that make sense to you?
A list is actually the character sequence:
\n
-\tLorem ipsum\n
-\tDolor sit\n
-\tAmet\n
\n
-\sFoo bar\n
-\sBar foo\n
-\sFoo\n
I haven't attempted it yet, but I also need to support a char sequence
like
\n
1.\tLorem ipsum\n
2.\tDolor sit\n
2.1.\tAmet\n
2.2.\tConsectetuer Amet\n
making a nested HTML ordered list <ol><li><ol>Consectetuer
Amet</ol></li></ol>.
Hence my earlier point about nested lists.
________________________________
From: Matthew Ford [mailto:matthew.ford at forward.com.au]
Sent: 02 June 2005 23:02
To: Matthew Pearce; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list
Is the list actually the character sequence
/n
/t-/tbullet/n
/t-/tbullet/n
/t-/tbullet/n
/t-/tbullet/n
What makes a list different from other text like /t-/t
matthew
You may need to do infinite lookahead to decided you are processing a
list
like
(list) => list
see Syntactic Predicates in the docs
matthew
----- Original Message -----
From: Matthew Pearce <mailto:mpearce at digitas.com>
To: antlr-interest at antlr.org
Sent: Friday, June 03, 2005 1:19 AM
Subject: [antlr-interest] Newbie! how can I convert a list of
bullets to anHTML list
I'd like to convert a list of bullets to an HTML list, i.e.:
From:
- bullet
- bullet
- bullet
To:
<ul><li>bullet</li><li>bullet</li><li>bullet</li></ul>
I thought over a few different options:
1. Have the lexer produce a LIST token when it matches:
- bullet
But I don't know how to get the parser to find the <ul> tags,
because I cannot add a special case
2. Have the lexer produce a LIST token when it matches:
- bullet
- bullet
- bullet
But I don't know how to get the parser to insert the <li> tags,
because it hasn't tokenized each bullet
3. Have the parser match a rule for list that matches like:
list: LIST^ PARA (LIST! PARA)+
Which would give me an AST node like, that could support nested
lists.
LIST ----+----PARA
+----PARA
+----LIST--------+-PARA
+---PARA
But this gives me non-determinisim, between match a straight
paragraph (PARA), and a bulleted line LIST PARA.
Can anyone suggest an approach?
class CourseTreeWalker extends TreeParser;
tree2html returns [String s]
{ s = ""; }
:
(#(t:TTL (p:PARA | l:list)+ {
s+="<h4>" +t+ "</h4>\n";
s+= "<p>" +p+ "</p>\n";
s+= "<ul>"+l+"</ul>"; } ))+ // this doesn't do
what I want
;
list // this doesn't do what I want
{ String l = ""; }
:
(#(LIST (p2:PARA) {
l+="<ul><li>" +p2+ "</li></ul>\n";
} ))
;
class CourseParser extends Parser;
options {
buildAST = true;
}
file : (section)+ EOF! ;
section : TTL^ (listexpr)+;
listexpr : (LIST^)? paraexpr; // this just matches each
bullet, instead of treating bullets as a group
paraexpr: (PARA);
class CourseLexer extends Lexer;
options {
k = 3;
charVocabulary = '\3'..'\377';
}
PARA : ("LZU") =>
("LZU" (LETTER | DIGIT | ' ' | '/')+) { $setType(TTL);
}
|
("Des") =>
("Description:") { $setType(TTL); }
|
("Lea") =>
("Learning objectives:") { $setType(TTL); }
|
("Tar") =>
("Target audience:") { $setType(TTL); }
|
("Pre") =>
("Prerequisites:") { $setType(TTL); }
|
(CHAR | ' ' )+
;
LIST : ('-' | '*') ;
NEWLINE : (
('\r''\n')=> '\r''\n' //DOS
| '\r' //MAC
| '\n' //UNIX
)
{ $setType(Token.SKIP); newline(); }
;
protected
DIGIT
: '0'..'9'
;
protected
LETTER
: ('a'..'z' | 'A'..'Z')
;
protected
CHAR
: ~( '\n' | '\r' | ' ' | '\t' | '\f' | '-' | '*' )
;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050603/a2c99202/attachment-0001.html
More information about the antlr-interest
mailing list