[antlr-interest] Two more lexer bugs in antlr-03-16-2007.10

Sun Mar 18 14:00:55 PDT 2007

Howdy. I think i made it work.  2 false starts, wasting a day.  The  
fix was like 30 minutes once I reverted. ;)  Try new build.

antlr-03-18-2007.14.tar.gz

Ter

On Mar 16, 2007, at 7:30 PM, Gavin Lambert wrote:

> At 13:49 17/03/2007, you wrote:
> >ok, i figured out how to refactor/clean up, but it will take some
> >work. ;)  Might get it done tomorrow.
>
> Ok, now I'm a little more puzzled.  I thought it was the reference  
> to WS that it was objecting to, especially given your earlier  
> comment about the order of input.  But the following grammar fails  
> in the same way:
>
> lexer grammar Test;
>
> NormalChar
>   : ~('"' | '\\' | '\r' | '\n' | ' ' | '\t')
>   ;
>
> QSTRING
>   : '"' (NormalChar | ' ' | '\t')* '"'
>   ;
>
> ... even if I make NormalChar a fragment.
>
> ....
>
> Ok, a little more fiddling around reveals that it's the (NormalChar  
> | anything) bit that it's really objecting to.  If I change it to  
> just NormalChar* then it compiles.
>
> I tried declaring a fragment rule in between the two (shown below),  
> but it wouldn't compile that either.
>
> fragment ExtendedChar: NormalChar | ' ' | '\t';
> QSTRING: '"' ExtendedChar* '"';
>
>
> Anyway, if you're reworking sets, one idea that's crossed my mind  
> is that it'd be nice (read: completely optional, ignore me if it's  
> too much work) to be able to exclude characters from an existing  
> set rule as well.  So you could for example take a WS rule  
> containing the twenty different characters that are considered  
> whitespace normally, and in one particular lexer rule say you want  
> anything that's whitespace unless it's this one character you don't  
> want.  Or take the NormalChar rule above and exclude an additional  
> character (say, single quote) when referring to it from another  
> rule.  (Another example might be for handling things like octal  
> digits, where you want a digit but only in a smaller range than  
> normal.)  This is not really a big deal since you can fairly easily  
> (once set combining works, anyway) factor the existing rule out to  
> a smaller set (or larger exclusion set), but it could come in handy  
> sometimes.  I have no idea what a reasonable syntax for that would  
> be though (except maybe something like 'set1 & ~set2', which is a  
> bit bizarre), so maybe it's not worth worrying about.
>