[antlr-interest] (no subject)
Imre András
iar73 at freemail.hu
Sat Nov 1 13:47:56 PDT 2008
Gavin Lambert <antlr at mirality.co.nz> írta:
> At 03:14 2/11/2008, =?ISO-8859-2?Q?Imre_Andr=E1s?= wrote:
> >This compiles ok (I'm still wondering about the $type and token
> >magic), but there is a problem with processing A ::= SEQUENCE {
> >a, SEQUENCE, c} as input. A MismatchedTokenException occurs
> >instead of treating SEQUENCE as an ID.
> [...]
> >tokens { SQ; }
> [...]
> >SEQUENCE : 'SEQUENCE';
> [...]
> >ID: ( SEQUENCE )=> SEQUENCE {$type = SQ;} | ('a'..'z'|'A'..'Z')+
> >;
>
> I missed the earlier part of this thread, so I'm not entirely sure
> what you're trying to accomplish, but the above seems wrong.
>
> First off, you're defining two top-level tokens that can match the
> character sequence 'SEQUENCE': the SEQUENCE rule, which will
> assign it the token type SEQUENCE, and the ID rule, which will
> assign it the token type SQ (which you are not actually using
> anywhere in your parser rules, so if it ever gets produced your
> parser will have mismatched token errors).
>
> If you want the ID rule to be the only source of matching for the
> input text "SEQUENCE", then you need to make the SEQUENCE rule a
> fragment and either change your parser rules to look for SQ
> instead or to make the ID rule set the token type to SEQUENCE.
>
> Alternatively (provided there aren't weird issues with lookahead)
> you ought to be able to remove the SEQUENCE-matching clause from
> ID entirely. The input text "SEQUENCE" should always generate a
> SEQUENCE token without that clause in ID being present, so that
> clause is just introducing ambiguity (though it does guarantee
> full lookahead).
>
> >How can I tell ANTLR that 'SEQUENCE' means a list when followed
> >by '{', otherwise it is an ID?
>
> This sounds like an entirely different problem than what the above
> clause is trying to accomplish :)
>
> There are two schools of thought on how to treat out-of-place
> keywords as if they are just normal identifiers, and they're both
> covered here:
> <http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741>http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741
>
>
> in,
I'd like to have a simple grammar that recognizes assignments. An assignment has an ID on the left side, and either an ID or a non-empty sequence(list) on the right side. Sequences can be embedded. The string 'SEQUENCE' should be allowed in a sequence, being an ID there.
I came across two mutually exclusive problems:
First, if I allow 'SEQUENCE' to be an ID, I have ambiguity errors in the grammar. Tried to get rid of it using fragments but not succeeded yet. I assume there should be no ambuiguity looking forward one token. If the next token is '{', it should denote a sequence def, otherwise it should be an ID.
Second, if I want 'SEQUENCE' to be a reserved keyword, I don't know how to specify that this string should not be recognized as an ID. I assume I have to do something with the ID rule, but currently have no clue how to do that. I examined the v1.5 Java Grammar available on the antlr site, looking for keyword and identifier relation inside, but found nothing. I even tried to start with rule Identifier (for this I had to bring it forward as the first rule) to see what happens when I specify keywords, but the grammar had errors (The following token definitions can never be matched because prior tokens match the same input: ENUM,ASSERT).
Unfortunately my grammar got a bit large, and still not working yet :( I think I have to invest a couple of hours more to do it right :)
However, if you could help to flatten the learning curve, please do not hesitate. Thanks for the answers so far.
Regards,
András
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081101/3de5c97e/attachment.html
More information about the antlr-interest
mailing list