[antlr-interest] Antlr grammar for xml like grammar
Matt Palmer
mattpalms at gmail.com
Tue Aug 26 15:03:07 PDT 2008
Hi Ymo,
try this one:
grammar T;
tokens {
LG='\u00ab';
RG='\u00bb';
}
all : ( pi | code | text | comment )* ;
pi : TOK_PI;
code : TOK_CODE;
text : TOK_TEXT;
comment : TOK_COMMENT;
//LEXER
TOK_TEXT: ( ~(LG|RG) )+;
TOK_LG_START
: ( LG '%--' )=>
( LG '%--' ( options {greedy=false;} : . )* '--%' RG
{$type=TOK_COMMENT;} )
| ( LG '@' RG )=>
( LG '@' RG
{$type=TOK_PI;} )
| ( LG ( options {greedy=false;} : . )* RG
{$type=TOK_CODE;} )
;
fragment TOK_CODE
: LG ( options {greedy=false;} : . )* RG
;
fragment TOK_PI
: LG '@' RG
;
fragment TOK_COMMENT
: ( LG '%--' ( options {greedy=false;} : . )* '--%' RG );
Not very elegant, but it seems to do the job.
Matt.
On Tue, Aug 26, 2008 at 7:56 PM, Matt Palmer <mattpalms at gmail.com> wrote:
> Hi Ymo,
>
> again, I'm not an expert at this, but this grammar parses your input text:
>
> grammar T;
>
> tokens {
> LG='\u00ab';
> RG='\u00bb';
> }
>
> // parser
>
> all : ( pi | code | text | comment )*;
> pi : TOK_PI;
> comment : TOK_COMMENT;
> code : TOK_CODE;
> text : TOK_TEXT;
>
> // LEXER
>
> TOK_PI : LG '@' RG;
>
> TOK_COMMENT
> : TOK_LCOMMENT ( options {greedy=false;} : . )*
> TOK_RCOMMENT;
>
> TOK_TEXT : ( ~(LG|RG) )+;
>
> TOK_CODE : LG ~'@' ( options {greedy=false;} : . )* RG;
>
> fragment TOK_LCOMMENT
> : LG '%--';
>
> fragment TOK_RCOMMENT
> : '--%' RG;
>
> It's not quite right (the ~'@' in TOK_CODE is only the first (hacky) way I
> could make the parser distinguish between TOK_PI and TOK_CODE. If you take
> this out, it still works, but will recognise TOK_PI as TOK_CODE. Adding a
> syntactic predicate ( LG '@' RG )=> to TOK_PI did not help the issue. So
> this isn't a solution, but I hope it moves you towards one.
>
> I've added some parser rules so you can see the parse tree in antlr.
>
> Matt
>
> On Tue, Aug 26, 2008 at 5:40 PM, Ymo <ymo.mail at gmail.com> wrote:
>
>> Hi matt i apreciate you taking a look at this.
>>
>> I pasted the reduced input & grammar:
>>
>> The first line is never recognized as TOK_PI. It is always seen as
>> TOK_CODE.
>>
>> Input is :
>> «@»
>> «fgdsfgs»
>> «%-- comment --%»
>>
>> then i reduced the grammar to this
>>
>> tokens {
>> LG='\u00ab';
>> RG='\u00bb';
>> }
>>
>>
>> //LEXER
>> TOK_PI : LG '@';
>> TOK_LCOMMENT : '%-';
>> TOK_RCOMMENT : '-%';
>>
>> TOK_BLOCK : { tagMode==false }? =>
>> (LG TOK_LCOMMENT) => TOK_COMMENT { $type=TOK_COMMENT; } |
>> (TOK_PI) => TOK_PI { $type=TOK_PI; } |
>> (LG ) => TOK_CODE { $type=TOK_CODE; } |
>> TOK_TEXT { $type=TOK_TEXT; } {
>> };
>>
>> fragment
>> TOK_TEXT :
>> ( ~(LG|RG) )+ {
>> };
>>
>> fragment
>> TOK_CODE :
>> LG ( options {k=2;greedy=false;} : . )* RG {
>> };
>>
>> fragment
>> TOK_COMMENT :
>> LG TOK_LCOMMENT ( options {k=3;greedy=false;} : . )* TOK_RCOMMENT RG {
>> $channel=HIDDEN;
>> };
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080826/27485e9e/attachment.html
More information about the antlr-interest
mailing list