[antlr-interest] Missing characters in partial matches
Matt Palmer
mattpalms at gmail.com
Fri Aug 22 18:11:43 PDT 2008
Hi Jim,
thanks - that clears up why the characters were missing. I'm afraid your
code hasn't cleared up my problem though. I still get missing characters.
At the heart of my problem, I guess I'm not sure why, when the start comment
didn't match, the lexer didn't proceed to match a Lsqb, followed by Text. I
can make it parse the text as given (albeit awkwardly) by specifying all the
intermediate prefixes as other tokens, using this grammar:
grammar T;
all : ( text | comment | nc1 | nc2 | lsqb )*;
text : Text;
comment : Comment;
nc1 : NotCom1;
nc2 : NotCom2;
lsqb : Lsqb;
Comment : '[!--' (options {greedy=false;} : . )* '--]' ;
NotCom1 : '[!-' ;
NotCom2 : '[!';
Lsqb : '[' ;
Text : (~Lsqb)+ ;
I think I need to investigate the lexer behaviour in some more detail. Any
pointers welcome!
cheers,
MattP.
On Sat, Aug 23, 2008 at 1:29 AM, Jim Idle <jimi at temporal-wave.com> wrote:
> On Sat, 2008-08-23 at 01:20 +0100, Matt Palmer wrote:
>
> Hi,
>
> I'm scratching my head about a problem with multi-line comments, where
> characters that only partially matched the comment header are removed from
> the character stream. I've boiled the problem down to the simple grammar
> below:
>
> grammar T;
>
> all : ( Text | Lsqb | Comment )* ;
>
> Comment : '[!--' (options {greedy=false;} : . )* '--]' ;
> Lsqb : '[' ;
> Text : ( ~Lsqb )+ ;
>
> If this text is run through the antlrworks debugger (1.1.7 and 1.2b5):
>
> A test [!-- comment --] of text [!that looks like the start [!-of a
> [!comment, but [isn't one.
>
> then the parse tree displays this:
>
> root
> |
> all
>
> |_____________________________________________________________________________
> | | | |
> | | | |
> A test *[!-- comment --]* of text *hat looks like the start* *f a* *
> omment*, but *[* isn't one.
>
>
> The real comment itself matches fine, and the solitary square bracket is
> also OK, but the other characters that are partial prefixes of a comment are
> simply stripped out of the rest of the text. Note that this problem only
> surfaces if the comment header is greater than 2 characters in length. Can
> anyone shed some light on this behaviour?
>
>
> If you look at the console output you will see that hte lexer is telling yu
> about invalid characters and then syncing up to somethign it can do
> somethign with. You need::
>
> Comment : '['
> ( '!--'=> '!--' (options {greedy=false;} : . )* '--]'
> | { $type = Lsqb; }
> )
> ;
>
> fragment
> Lsqb : '[' ;
>
>
> Thanks,
>
> MattP.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080823/b81f6782/attachment.html
More information about the antlr-interest
mailing list