[antlr-interest] Lexing problem

mzukowski at yci.com mzukowski at yci.com
Thu Jun 5 08:29:55 PDT 2003


Try this:

STRING: '"' ~('"' | '#') CODESCAPE | '"';
CODESCAPE: '#' ~('"' | '#') STRING | '#';

You might need to alter it to handle escape characters if it has them like
C's \"

Monty

-----Original Message-----
From: Jim Irwin [mailto:jimirwin at acm.org] 
Sent: Wednesday, June 04, 2003 4:58 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Lexing problem


Hi, I'm new to Antlr, and I have a problem for which I would welcome 
suggestions.  I'm trying to parse ColdFusion code, and the language 
allows strings to contain expressions.  The syntax is roughly the 
following: varname = "... #expression_1# ..." where the hash marks 
enclose a ColdFusion expression that is evaluated and substituted 
into the string at runtime.

The real problem is that the embedded expression is itself allowed 
to contain strings, so that a single source-code string may look 
like the following:

"...#iif("a" gt "#b#", "cat", "dog")#..."

My problem is that I cannot think of a way to define a lexical rule 
that would recognize such a complex string.  In principle, the 
string should be parsed.  I can conceive of the lexer returning a 
token representing the entire string to the parser, and the parser 
then recursively lexing and parsing the string value until there are 
no more embedded hash-expressions.

I have no clue as to how I should proceed.  In order to lex the 
string, I seem to need a specialized routine that looks ahead, keeps 
track of nested expressions and their strings, and terminates only 
when the matching end quote outside of all expressions is 
encountered.

Any suggestions?


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list