[antlr-interest] Antlr grammar for xml like grammar

Matt Palmer mattpalms at gmail.com
Tue Aug 26 11:56:08 PDT 2008


Hi Ymo,

again, I'm not an expert at this, but this grammar parses your input text:

grammar T;

tokens {
    LG='\u00ab';
    RG='\u00bb';
}

// parser

all        :    ( pi | code | text | comment )*;
pi         :    TOK_PI;
comment    :    TOK_COMMENT;
code       :    TOK_CODE;
text       :    TOK_TEXT;

// LEXER

TOK_PI     :    LG '@' RG;

TOK_COMMENT
           :    TOK_LCOMMENT ( options {greedy=false;} : . )* TOK_RCOMMENT;

TOK_TEXT   :    ( ~(LG|RG) )+;

TOK_CODE   :    LG ~'@' ( options {greedy=false;} : . )*  RG;

fragment TOK_LCOMMENT
           :    LG '%--';

fragment TOK_RCOMMENT
           :    '--%' RG;

It's not quite right (the ~'@' in TOK_CODE is only the first (hacky) way I
could make the parser distinguish between TOK_PI and TOK_CODE.  If you take
this out, it still works, but will recognise TOK_PI as TOK_CODE.  Adding a
syntactic predicate ( LG '@' RG )=> to TOK_PI did not help the issue.  So
this isn't a  solution, but I hope it moves you towards one.

I've added some parser rules so you can see the parse tree in antlr.

Matt

On Tue, Aug 26, 2008 at 5:40 PM, Ymo <ymo.mail at gmail.com> wrote:

> Hi matt i apreciate you taking a look at this.
>
> I pasted the reduced input & grammar:
>
> The first line is never recognized as TOK_PI. It is always seen as
> TOK_CODE.
>
> Input is :
> «@»
> «fgdsfgs»
> «%-- comment --%»
>
> then i reduced the grammar to this
>
> tokens {
>     LG='\u00ab';
>     RG='\u00bb';
> }
>
>
> //LEXER
> TOK_PI : LG '@';
> TOK_LCOMMENT : '%-';
> TOK_RCOMMENT : '-%';
>
> TOK_BLOCK : { tagMode==false }? =>
>    (LG TOK_LCOMMENT) => TOK_COMMENT { $type=TOK_COMMENT; } |
>    (TOK_PI) => TOK_PI { $type=TOK_PI; } |
>    (LG ) => TOK_CODE { $type=TOK_CODE; } |
>    TOK_TEXT { $type=TOK_TEXT; }  {
>    };
>
> fragment
> TOK_TEXT :
>    ( ~(LG|RG) )+ {
>     };
>
> fragment
> TOK_CODE :
>    LG  ( options {k=2;greedy=false;} : . )*  RG {
>     };
>
> fragment
> TOK_COMMENT :
>    LG TOK_LCOMMENT ( options {k=3;greedy=false;} : . )* TOK_RCOMMENT RG {
>       $channel=HIDDEN;
>     };
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080826/c4e9b05f/attachment.html 


More information about the antlr-interest mailing list