[antlr-interest] Comment rule matches links

Jenny Balfer ai06087 at Lehre.BA-Stuttgart.De
Tue Aug 26 01:24:10 PDT 2008


>  >I am using the "standard" rule for single line comments:
>  >
>  >COMMENT : '//' (options {greedy=false;}: .)* ('\n'|'r')
>  >          { skip(); }
>  >        ;
>  >
>  >This works pretty well, until I have things like that in my 
> code:
>  >
>  >aString = "http://someUrl.com";
>  >
>  >Because the url contains two slashes, the lexer treats 
> everything
>  >from then on as a comment and skips the rest of the line; only
>  >aString = "http: remains.
>  >
>  >I tried to fight this problem by adding a rule that matches 
> every
>  >string before the comment rule:
>  >
>  >STRING : '"' (options {greedy=false;} .)* '"'
>  >       ;
>  >
>  >This temporarily solved the problem, but brought up further 
> issues,
>  >so I would really appreciate to get along without it. Does 
> anyone
>  >have a better solution to prevent my lexer from skipping urls 
> just
>  >because they contain slashes?
> 
> Using a STRING rule is probably the best way to do this.  (And not 
> just for this sort of problem -- generally you want strings to be 
> recognised as single entities anyway, instead of random sequences 
> of other tokens, and you need to preserve whitespace.)
> 
> While it might be possible to ignore //s within quotes via other 
> means (eg. semantic predicates), it'd be quite painful and would 
> still give you malformed strings in other cases.
> 
> What kind of "further issues" is it causing?

My problem are regular expressions that match quotes, like this one:

replace(/"/, """);

In this case, the STRING rule matches everything from the first to the
second quote, which is "/, ", and then takes everything beginning from the
last quote sign to any further one. 
I already found the article about island grammars
(http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control),
but I have no idea how I can apply this solution for my problem, for the
workaround is for parser grammars and my STRING / COMMENT rules are still
part of the lexer.





More information about the antlr-interest mailing list