[antlr-interest] Lexing problem
mzukowski at yci.com
mzukowski at yci.com
Thu Jun 5 08:29:55 PDT 2003
Try this:
STRING: '"' ~('"' | '#') CODESCAPE | '"';
CODESCAPE: '#' ~('"' | '#') STRING | '#';
You might need to alter it to handle escape characters if it has them like
C's \"
Monty
-----Original Message-----
From: Jim Irwin [mailto:jimirwin at acm.org]
Sent: Wednesday, June 04, 2003 4:58 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Lexing problem
Hi, I'm new to Antlr, and I have a problem for which I would welcome
suggestions. I'm trying to parse ColdFusion code, and the language
allows strings to contain expressions. The syntax is roughly the
following: varname = "... #expression_1# ..." where the hash marks
enclose a ColdFusion expression that is evaluated and substituted
into the string at runtime.
The real problem is that the embedded expression is itself allowed
to contain strings, so that a single source-code string may look
like the following:
"...#iif("a" gt "#b#", "cat", "dog")#..."
My problem is that I cannot think of a way to define a lexical rule
that would recognize such a complex string. In principle, the
string should be parsed. I can conceive of the lexer returning a
token representing the entire string to the parser, and the parser
then recursively lexing and parsing the string value until there are
no more embedded hash-expressions.
I have no clue as to how I should proceed. In order to lex the
string, I seem to need a specialized routine that looks ahead, keeps
track of nested expressions and their strings, and terminates only
when the matching end quote outside of all expressions is
encountered.
Any suggestions?
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list