[antlr-interest] Trying to parsing CFML (ColdFusion)

Thu Feb 7 07:43:29 PST 2008

Hi
Thanks for the feedback.  I wonder if I should handle this in multiple
passes (with different parsers)... In one pass I could extract the
relatively easy-to-spot tag-based elements, and in this simple pass just
grab all of the text between two tags and store it as a property (or
subelement) of the parent element.

Sample input:

Html here 1
<cfif ...>
	html here 2
<cfelse>
	html here 3
</cfif>
Html here 4

The goal would be to tokenize everything between the tags into a single
"content" (aka html) token.  This limits the harder problem of
tokenizing expressions to the "attributes" in the cftags (the "..." in
the cfif).  Parsing the html can be done by a separate parser later.

This makes the tokenizer stateful, but there are only two states:
looking for a cftag start or looking for a cftag end.

Does this make sense?  

Thanks
	Mark

-----Original Message-----
From: Johannes Luber [mailto:jaluber at gmx.de] 
Sent: Wednesday, February 06, 2008 5:30 PM
To: Gaulin, Mark
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Trying to parsing CFML (ColdFusion)

Gaulin, Mark schrieb:
> Is this grammar (& lexer) possible with ANTLR?  For someone new to it?

> It started to feel like this is a case of an "island grammar", but 
> even attempting to parse a "simple" quoted string (that can contain 
> hashes) all by itself is blowing my mind... the lexer is very very 
> context dependent.
>  
> Can anyone offer some advice?  I'm almost tempted to just write the 
> thing in straight java, but that feels lame (and tedious).

This looks like that you need at least to control the lexer from the
parser. That isn't possible in ANTLR yet, but doable to add it yourself.

I suppose that will be less work than to write a parser yourself.

Johannes