[antlr-interest] non-determinism.
mark kant
markkant2001 at yahoo.com
Mon Mar 31 11:15:09 PST 2003
I am using a grammer that has been already defined as
a standard by IETF - SIP standard. I do not have much
choice in that. The fact is it is not easy to remove
the non-determinism. There are several lex rules
that give non-determinism warning. Only the parser
knows what to alternative set of tokens should be
next.
An implementation by NIST using antlr has lots of
lexers for the grammer. I am still trying to
understand it - how they invoke various lexers etc.
Mark
---------------------------------
--- Albert Huh <albert.huh at embarcadero-ca.com> wrote:
> i totally agree. i was just trying answer his
> question about whether it was possible. my gut
> instincts tell me that in most situations, if you
> need to control the lexer from the parser then you
> probably need to re-evaluate your grammar.
>
> albert
>
> -----Original Message-----
> From: mzukowski at yci.com [mailto:mzukowski at yci.com]
> Sent: Monday, March 31, 2003 11:19 AM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
>
>
> That's playing with fire. If you actually do any
> feedback from the parser
> to the lexer you have to manually guarantee that the
> tokens you want to have
> an effect on have not yet been lexed. Your parser
> will be at least k tokens
> behind the lexer. If you use syntactic predicates
> then you will have lexed
> every token needed to evaluate that predicate.
>
> Monty
>
> -----Original Message-----
> From: Albert Huh
> [mailto:albert.huh at embarcadero-ca.com]
> Sent: Wednesday, March 26, 2003 11:41 AM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
>
>
> actually, you can control the lexer from the parser
> to a degree. i'm not
> sure about switching rules, but you can definitely
> make the lexer change the
> token type depending on if you've set a flag or not.
>
> the parser will need a reference to the lexer to do
> this.
>
> within the lexer, you can simply add some actions
> like
>
> {
> if (nameFlag) {
> setType(NAME);
> }
> }
>
> i probably didn't use the proper method name for
> changing the type, but you
> get the idea.
>
> though this probably isn't the greatest idea.
> ideally the lexer and parser
> should be able to run independantly.
> -----Original Message-----
> From: Greg Lindholm [mailto:glindholm at yahoo.com]
> Sent: Wednesday, March 26, 2003 1:12 PM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
>
>
> Sorry, lexers and parsers (certainly Antlr) don't
> work the way you want them
> to. There is no facility for the parser to tell the
> lexer what tokens to
> look for.
> The lexer acts mostly independently from the parser
> and it's job is to
> translate a stream of characters into a stream of
> tokens. (The tokens are
> then consumed by the parser. ) So the lexer has to
> be able to look at a
> sequence of characters and decide what token type to
> give it.
> You probably need to read the Antlr documentation
> again and study some of
> the examples in order to create a workable approach
> to constructing a
> solution.
> If you want to describe what you are trying to
> accomplish someone may be
> able to suggest an approach you can take.
> Greg
> mark kant <markkant2001 at yahoo.com> wrote:
> 1. it is not "required" distinguishing character but
> that it may be present. The point is that there is a
> list of token which are supersets of some other
> tokens.
>
> So in your question - what type of TOKEN is 'a' ?
> The answer is that 'a' can be NAME, ID and also
> TOKEN.
> If a digit follows 'a', then a lexer can either
> return
> NAME or an ID. If I let the lexer consume the digit
> and return an ID, but the parser was expecting only
> a
> NAME, then ID would be incorrect.
> To me a parser tell lexer to find a token from a set
> of X tokens.
> In my problem there are 2 or more sets of tokens.
> The
> parser needs to tell lexer which set to look for.
>
> Thanks again for the discussion. I will really
> appreciate any help in this.
>
> Mark
>
> ----------------------
> --- Greg Lindholm wrote:
> >
> > If each token type has a "required" distinguishing
> > character then there would not be an
> > non-determiniism, but that is not what you have
> > written in the rules below.
> > Did you decide which token type an 'a' is? How
> about
> > a '9'? You're not going to get very far building a
> > lexer until you make these basic decisions.
> > Once you have some example cases, if you then have
> > trouble building the lexer to match your examples,
> > then people on this list will help you.
> > mark kant wrote:There is a
> > slight difference. Each of them also has
> > extra characters to distinguish. Example TOKEN
> also
> > has '~' character in it. If I expected an ID, but
> I
> > return TOKEN_OR_ID, then how do I know it is a
> valid
> > ID (it may have '~' in it, which makes it invalid
> > ID,
> > but valid TOKEN )
> >
> >
> > Mark
> >
> > ------------------------------
> > --- Greg Lindholm wrote:
> > >
> > > To understand the non-determinism it might help
> > you
> > > if you consider some example tokens with this
> > lexer.
> > > If your lexer sees the single character 'a' what
> > > type of token would you like it to return? One
> of
> > > the non-determinism this lexer has is that 'a'
> > > matches the NAME, ID, and TOKEN rules. Which is
> > it?
> > > Note that ANTLR doesn't care what order the
> rules
> > > appear in (unlike lex). Same thing goes with the
> > > single character '9', it matches both TOKEN and
> > > NUMBER.
> > > So I recommend work up some example cases and
> > decide
> > > what you want your lexer to return for each
> case.
> > > In some languages a given sequence of characters
> > can
> > > mean completely different things (different
> token
> > > type) based on the context of those characters.
> > > Antlr is basically a context-free lexer
> > (predicates
> > > can help sometimes). In these cases you might
> need
> > > to delay exact identification of the token type
> > > until you know the context (symantic analysis
> > > phase). For example you might have the lexer
> > return
> > > a token type NAME_OR_ID then later figure out
> > which
> > > it is once you know the context.
> > > Hope this helps,
> > > Greg
> > >
> > > mark kant wrote:How about
> > > the following lexer
> > >
> > >
> > > protected:
> > > ALPHA: ('a'..'z'|'A'..'Z')
> > > ;
> > > protected:
> > > ALPHA_NUM: ('a'..'z'|'A'..'Z'|'0'..'9')
> > > ;
> > > protected:
> > > DIGIT: '0'..'9'
> > > ;
> > >
> > >
> > > NAME: (ALPHA) ((ALPHA) | '_' | '.') )*
>
=== message truncated ===
__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list