[antlr-interest] Resolving ambiguities in Lexer rules

Achint Mehta achintmehta at gmail.com
Sat Aug 15 16:08:33 PDT 2009


Hi Joe,

Thanks for your response.

You have proposed two solutions:
1. Replace ver with SPECIAL_STRING and check in the target code for allowed
values. This means that if I intent to collect a generic unquoted string in
a antlr parser, then I cannot use any tokens in the whole parser. In a big
parser, this seems to be a limitation, which means that the target language
program validates every string where token should have been placed in the
parser.

2. The second option is that all the tokens have to given as alternate
rules/token with SPECIAL_STRING. Again, in a big/complicated parser, all the
tokens in the whole parser have to be repeated where-ever I intend to use
the SPECIAL_STRING. This can be simplified if I give the tokens in the
definition of SPECIAL_STRING iteself. But still in a parser which could use
tens or hundreds of tokens, it would seem to be impractical to repeat all
the tokens in SPECIAL_STRING rule and other similar rules (intended for
collecting the generic string).

The parser that I have put in the e-mail is a simplified version of the
issue I am facing. I am writing a SIP protocol message parser. The very
first line of a SIP message starts as (I am compressing the rules for
clarity):

Method SPACE Request-URI ... (other rules follow)
Method: "INVITE" | "ACK" | "OPTIONS" | "BYE" | "CANCEL" | "REGISTER"
Request-URI boils down to : "sip:" [userinfo "@"] hostport url-parameters
[headers]
and userinfo is an unquoted alpha-numeric string.

if the SIP starts as REGISTER SIP:REGISTER at ...
The parsing would fail if I write the rules as I mentioned in my sample
program earlier.
SIP protocol is filled with rules such as userinfo where unquoted
alphs-numeric strings have to be collected and there are tens of tokens in
its grammar. This is a typical scenario for any protocol grammar. I am not
sure  repeating all tokens in rules or treating everything as genric string
would be a neat solution.

I admit that I am a noob when it comes to familarity with other
lexers/parsers, and rest of them might require some other work-around as
well. But situation seems to be pretty common enough to have a straight
solution (though I might be wrong).

Thanks.

Regards,
Achint


>
> I don't see this as an ambiguity issue but rather a decision of whether
> your grammar uses reserved words or not.
> I'm not an expert by any means but that doesn't mean I don't have an
> opinion just that you should take it with a grain of salt.
>
> You can either handle this with a symbol table later in the process or
> rewrite the requestline to something like
> requestline : ver EQUAL (SPECIAL_STRING | ver);
>
> Joe
>
>
> Achint Mehta wrote:
> > Hi All,
> >
> > The section "Ambiguities and Non determinisms" of the book "The
> > definitive ANTLR guide" talks about the ambiguities in lexer rules,
> > but I am not sure how to resolve them.
> >
> > Consider a following grammar which assigns a value to an ID. The ID
> > can either be VERSION or COUNT while its value can be anything:
> > -----------------------------------------------
> > grammar sample_parser;
> >
> > requestline : ver EQUAL SPECIAL_STRING ;
> >
> > /* Tokens */
> > ver:('VERSION'| 'V') {}
> >       | ('COUNT' | 'C') {} ;
> >
> >
> > SPECIAL_STRING:(CHAR)+ ;
> > WHITESPACE: ' ';
> > NEWLINE: ('\r')? '\n';
> > EQUAL: '=';
> >
> > fragment
> > CHAR: (('a'..'z')|('A'..'Z'));
> > -----------------------------------------------
> >
> > If the input is given as
> > VERSION=FIRST
> > Then it works, but if following input is given
> > VERSION=VERSION
> > Then I get an error (MissingTokenException after the "=").
> >
> > How can this ambiguity be resolved ?
> >
> > Thanks in advance.
> >
> > Regards,
> > Achint
> > ------------------------------------------------------------------------
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090815/8180a68a/attachment.html 


More information about the antlr-interest mailing list