[antlr-interest] Parsing Nested Multi-line text
Kurt Rayner
Kurt at AlphaSoftware.com
Tue Feb 28 09:36:55 PST 2006
While re-implementing a parser for a variant of BASIC, I got stuck on this
one.
The existing hand-coded parser supports nested multi-line string literals
(mainly for dynamic code generation) using the following form:
<<%identifier%
...
%identifier%
A lame example:
MyCodeSegment = <<%code%
if a < 12 then
evaluate_template(<<%code%
a = 12
%code%
else
evaluate_template(<<%code%
a = 14
%code%
end if
%code%
One would expect the names in the nested blocks would be different, but the
existing parser doesn't seem to care, and I have found exceptions.
Also note that the embedded text does NOT have to be parseable.
Here's what I've tried most recently to make the lexer handle the syntax.
ANTLR obviously doesn't like that it doesn't have sufficient look-ahead.
protected
MultiLineLiteralIdentifier
: '%' Identifier '%'
;
protected
MultiLineLiteralInitiator
: "<<" MultiLineLiteralIdentifier
;
protected
EmbeddedMultiLineLiteral
: MultiLineLiteralInitiator
(options { greedy=false; } :
(EmbeddedMultiLineLiteral | .) )*
MultiLineLiteralIdentifier
;
ASCIIStringLiteral
: ('"'! ( ('\\'! '\\') | ('\\'! '"') | ('"'!
'"') | (~'"'))* '"'!)
| MultiLineLiteralInitiator!
(options { greedy=false; } :
(EmbeddedMultiLineLiteral | .) )*
MultiLineLiteralIdentifier!
;
If I were using Flex, I would just take control of the input stream, but I
would prefer to use something a little more elegant.
Thanks in advance for any ideas.
Kurt Rayner
Development
Alpha Software, Inc.
83 Cambridge Street, Suite 3B
Burlington, MA 01803-4483
kurt at AlphaSoftware.com
(781) 229-4500 X 27
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060228/08b89b39/attachment-0001.html
More information about the antlr-interest
mailing list