[antlr-interest] Antlr grammar for xml like grammar

Tue Aug 26 15:03:07 PDT 2008

Hi Ymo,

try this one:

grammar T;

tokens {
    LG='\u00ab';
    RG='\u00bb';
}

all     :    ( pi | code | text | comment )* ;
pi      :    TOK_PI;
code    :    TOK_CODE;
text    :    TOK_TEXT;
comment :    TOK_COMMENT;

//LEXER

TOK_TEXT:    ( ~(LG|RG) )+;

TOK_LG_START
         :   ( LG '%--' )=>
             ( LG '%--' ( options {greedy=false;} : . )* '--%' RG
{$type=TOK_COMMENT;} )
         |   ( LG '@' RG )=>
             ( LG '@' RG
{$type=TOK_PI;} )
         |   ( LG ( options {greedy=false;} : . )* RG
{$type=TOK_CODE;} )
         ;

fragment TOK_CODE
    :     LG ( options {greedy=false;} : . )* RG
    ;

fragment TOK_PI
    :    LG '@' RG
    ;

fragment TOK_COMMENT
    :    ( LG '%--' ( options {greedy=false;} : . )* '--%' RG );

Not very elegant, but it seems to do the job.

Matt.

On Tue, Aug 26, 2008 at 7:56 PM, Matt Palmer <mattpalms at gmail.com> wrote:

> Hi Ymo,
>
> again, I'm not an expert at this, but this grammar parses your input text:
>
> grammar T;
>
> tokens {
>     LG='\u00ab';
>     RG='\u00bb';
> }
>
> // parser
>
> all        :    ( pi | code | text | comment )*;
> pi         :    TOK_PI;
> comment    :    TOK_COMMENT;
> code       :    TOK_CODE;
> text       :    TOK_TEXT;
>
> // LEXER
>
> TOK_PI     :    LG '@' RG;
>
> TOK_COMMENT
>            :    TOK_LCOMMENT ( options {greedy=false;} : . )*
> TOK_RCOMMENT;
>
> TOK_TEXT   :    ( ~(LG|RG) )+;
>
> TOK_CODE   :    LG ~'@' ( options {greedy=false;} : . )*  RG;
>
> fragment TOK_LCOMMENT
>            :    LG '%--';
>
> fragment TOK_RCOMMENT
>            :    '--%' RG;
>
> It's not quite right (the ~'@' in TOK_CODE is only the first (hacky) way I
> could make the parser distinguish between TOK_PI and TOK_CODE.  If you take
> this out, it still works, but will recognise TOK_PI as TOK_CODE.  Adding a
> syntactic predicate ( LG '@' RG )=> to TOK_PI did not help the issue.  So
> this isn't a  solution, but I hope it moves you towards one.
>
> I've added some parser rules so you can see the parse tree in antlr.
>
> Matt
>
> On Tue, Aug 26, 2008 at 5:40 PM, Ymo <ymo.mail at gmail.com> wrote:
>
>> Hi matt i apreciate you taking a look at this.
>>
>> I pasted the reduced input & grammar:
>>
>> The first line is never recognized as TOK_PI. It is always seen as
>> TOK_CODE.
>>
>> Input is :
>> «@»
>> «fgdsfgs»
>> «%-- comment --%»
>>
>> then i reduced the grammar to this
>>
>> tokens {
>>     LG='\u00ab';
>>     RG='\u00bb';
>> }
>>
>>
>> //LEXER
>> TOK_PI : LG '@';
>> TOK_LCOMMENT : '%-';
>> TOK_RCOMMENT : '-%';
>>
>> TOK_BLOCK : { tagMode==false }? =>
>>    (LG TOK_LCOMMENT) => TOK_COMMENT { $type=TOK_COMMENT; } |
>>    (TOK_PI) => TOK_PI { $type=TOK_PI; } |
>>    (LG ) => TOK_CODE { $type=TOK_CODE; } |
>>    TOK_TEXT { $type=TOK_TEXT; }  {
>>    };
>>
>> fragment
>> TOK_TEXT :
>>    ( ~(LG|RG) )+ {
>>     };
>>
>> fragment
>> TOK_CODE :
>>    LG  ( options {k=2;greedy=false;} : . )*  RG {
>>     };
>>
>> fragment
>> TOK_COMMENT :
>>    LG TOK_LCOMMENT ( options {k=3;greedy=false;} : . )* TOK_RCOMMENT RG {
>>       $channel=HIDDEN;
>>     };
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080826/27485e9e/attachment.html