[antlr-interest] Comment rule matches links

Mon Aug 25 12:01:43 PDT 2008

At 04:15 26/08/2008, Jenny Balfer wrote:
 >I am using the "standard" rule for single line comments:
 >
 >COMMENT : '//' (options {greedy=false;}: .)* ('\n'|'r')
 >          { skip(); }
 >        ;
 >
 >This works pretty well, until I have things like that in my 
code:
 >
 >aString = "http://someUrl.com";
 >
 >Because the url contains two slashes, the lexer treats 
everything
 >from then on as a comment and skips the rest of the line; only
 >aString = "http: remains.
 >
 >I tried to fight this problem by adding a rule that matches 
every
 >string before the comment rule:
 >
 >STRING : '"' (options {greedy=false;} .)* '"'
 >       ;
 >
 >This temporarily solved the problem, but brought up further 
issues,
 >so I would really appreciate to get along without it. Does 
anyone
 >have a better solution to prevent my lexer from skipping urls 
just
 >because they contain slashes?

Using a STRING rule is probably the best way to do this.  (And not 
just for this sort of problem -- generally you want strings to be 
recognised as single entities anyway, instead of random sequences 
of other tokens, and you need to preserve whitespace.)

While it might be possible to ignore //s within quotes via other 
means (eg. semantic predicates), it'd be quite painful and would 
still give you malformed strings in other cases.

What kind of "further issues" is it causing?