[antlr-interest] (no subject)

Imre András iar73 at freemail.hu
Sat Nov 1 13:47:56 PDT 2008


Gavin Lambert <antlr at mirality.co.nz> írta: 


> At 03:14 2/11/2008, =?ISO-8859-2?Q?Imre_Andr=E1s?= wrote:
> >This compiles ok (I'm still wondering about the $type and token 
> >magic), but there is a problem with processing A ::= SEQUENCE { 
> >a, SEQUENCE, c} as input. A MismatchedTokenException occurs 
> >instead of treating SEQUENCE as an ID.
> [...]
> >tokens { SQ; }
> [...]
> >SEQUENCE : 'SEQUENCE';
> [...]
> >ID: ( SEQUENCE )=> SEQUENCE {$type = SQ;} | ('a'..'z'|'A'..'Z')+ 
> >;
> 
> I missed the earlier part of this thread, so I'm not entirely sure 
> what you're trying to accomplish, but the above seems wrong.
> 
> First off, you're defining two top-level tokens that can match the 
> character sequence 'SEQUENCE': the SEQUENCE rule, which will 
> assign it the token type SEQUENCE, and the ID rule, which will 
> assign it the token type SQ (which you are not actually using 
> anywhere in your parser rules, so if it ever gets produced your 
> parser will have mismatched token errors).
> 
> If you want the ID rule to be the only source of matching for the 
> input text "SEQUENCE", then you need to make the SEQUENCE rule a 
> fragment and either change your parser rules to look for SQ 
> instead or to make the ID rule set the token type to SEQUENCE.
> 
> Alternatively (provided there aren't weird issues with lookahead) 
> you ought to be able to remove the SEQUENCE-matching clause from 
> ID entirely.  The input text "SEQUENCE" should always generate a 
> SEQUENCE token without that clause in ID being present, so that 
> clause is just introducing ambiguity (though it does guarantee 
> full lookahead).
> 
> >How can I tell ANTLR that 'SEQUENCE' means a list when followed 
> >by '{', otherwise it is an ID?
> 
> This sounds like an entirely different problem than what the above 
> clause is trying to accomplish :)
> 
> There are two schools of thought on how to treat out-of-place 
> keywords as if they are just normal identifiers, and they're both 
> covered here:
>    <http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741>http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741 
> 
> 
> in,

I'd like to have a simple grammar that recognizes assignments. An assignment has an ID on the left side, and either an ID or a non-empty sequence(list) on the right side. Sequences can be embedded. The string 'SEQUENCE' should be allowed in a sequence, being an ID there.

I came across two mutually exclusive problems:

First, if I allow 'SEQUENCE' to be an ID, I have ambiguity errors in the grammar. Tried to get rid of it using fragments but not succeeded yet. I assume there should be no ambuiguity looking forward one token. If the next token is '{', it should denote a sequence def, otherwise it should be an ID.

Second, if I want 'SEQUENCE'  to be a reserved keyword, I don't know how to specify that this string should not be recognized as an ID. I assume I have to do something with the ID rule, but currently have no clue how to do that. I examined the v1.5 Java Grammar available on the antlr site, looking for keyword and identifier relation inside, but found nothing. I even tried to start with rule Identifier (for this I had to bring it forward as the first rule) to see what happens when I specify keywords, but the grammar had errors (The following token definitions can never be matched because prior tokens match the same input: ENUM,ASSERT).

Unfortunately my grammar got a bit large, and still not working yet :( I think I have to invest a couple of hours more to do it right :)

However, if you could help to flatten the learning curve, please do not hesitate. Thanks for the answers so far.


Regards,
  András
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081101/3de5c97e/attachment.html 


More information about the antlr-interest mailing list