[antlr-interest] Missing characters in partial matches

Fri Aug 22 18:11:43 PDT 2008

Hi Jim,

thanks - that clears up why the characters were missing.  I'm afraid your
code hasn't cleared up my problem though.  I still get missing characters.

At the heart of my problem, I guess I'm not sure why, when the start comment
didn't match, the lexer didn't proceed to match a Lsqb, followed by Text.  I
can make it parse the text as given (albeit awkwardly) by specifying all the
intermediate prefixes as other tokens, using this grammar:

grammar T;

all     :    ( text | comment | nc1 | nc2 | lsqb )*;
text    :    Text;
comment :    Comment;
nc1     :    NotCom1;
nc2     :    NotCom2;
lsqb    :    Lsqb;

Comment :    '[!--'  (options {greedy=false;} : . )* '--]' ;
NotCom1 :    '[!-' ;
NotCom2 :    '[!';
Lsqb    :    '[' ;
Text    :    (~Lsqb)+ ;

I think I need to investigate the lexer behaviour in some more detail.  Any
pointers welcome!

cheers,

MattP.

On Sat, Aug 23, 2008 at 1:29 AM, Jim Idle <jimi at temporal-wave.com> wrote:

>  On Sat, 2008-08-23 at 01:20 +0100, Matt Palmer wrote:
>
> Hi,
>
> I'm scratching my head about a problem with multi-line comments, where
> characters that only partially matched the comment header are removed from
> the character stream. I've boiled the problem down to the simple grammar
> below:
>
> grammar T;
>
> all     :    ( Text | Lsqb | Comment )* ;
>
> Comment :    '[!--'  (options {greedy=false;} : . )* '--]' ;
> Lsqb    :    '[' ;
> Text    :    ( ~Lsqb )+ ;
>
> If this text is run through the antlrworks debugger (1.1.7 and 1.2b5):
>
> A test [!-- comment --] of text [!that looks like the start [!-of a
> [!comment, but [isn't one.
>
> then the parse tree displays this:
>
>   root
>     |
>    all
>
> |_____________________________________________________________________________
>     |           |                 |                  |
> |        |         |   |
>   A test *[!-- comment --]* of text  *hat looks like the start* *f a*  *
> omment*, but *[* isn't one.
>
>
> The real comment itself matches fine, and the solitary square bracket is
> also OK, but the other characters that are partial prefixes of a comment are
> simply stripped out of the rest of the text.  Note that this problem only
> surfaces if the comment header is greater than 2 characters in length.   Can
> anyone shed some light on this behaviour?
>
>
> If you look at the console output you will see that hte lexer is telling yu
> about invalid characters and then syncing up to somethign it can do
> somethign with. You need::
>
> Comment :    '['
>                  (   '!--'=> '!--'  (options {greedy=false;} : . )* '--]'
>                    | { $type = Lsqb; }
>                  )
>         ;
>
> fragment
> Lsqb    :    '[' ;
>
>
> Thanks,
>
> MattP.
>
>  List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080823/b81f6782/attachment.html