[antlr-interest] Comment rule matches links

Tue Aug 26 01:34:20 PDT 2008

>> What kind of "further issues" is it causing?
>
> My problem are regular expressions that match quotes, like this one:
>
> replace(/"/, "&quot;");
>
> In this case, the STRING rule matches everything from the first to the
> second quote, which is "/, ", and then takes everything beginning  
> from the
> last quote sign to any further one.
> I already found the article about island grammars
> (http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control 
> ),
> but I have no idea how I can apply this solution for my problem, for  
> the
> workaround is for parser grammars and my STRING / COMMENT rules are  
> still
> part of the lexer.

For this regular expression literal problem I think you will  
absolutely have to implement something like the island grammars,  
otherwise everything else will only get more and more fragile and  
hackish.

I've done something similar in my XQuery parser where I exchange the  
lexer if I enter some grammatical rule. In your case, you would add a  
semantic predicate in your regular expression literal grammar rule:

regExp: '/' { enterRegexp(); } regExpTokens '/' { leaveRegexp(); };

And then in enterRegexp() you need to exchange the lexer for a  
specific lexer that handles the regular expressions (or, if you just  
want them as an opaque string, one that simply eats everything up to  
the next unescaped slash).

Basically you have two languages, your regular outer language, and  
within the slashes the regexp language, and the latter is  
syntactically totally different. So IMHO the easiest way around that  
is to have a separate lexer/token source that kicks in in that area.  
This can be as simple as implementing a Lexer on your own in Java and  
just returning the full string up to the next non-escaped slash.

HTH,
Martin