[antlr-interest] non-determinism.

mark kant markkant2001 at yahoo.com
Mon Mar 31 11:15:09 PST 2003


I am using a grammer that has been already defined as
a standard by IETF - SIP standard.  I do not have much
choice in that. The fact is it is not easy to remove
the non-determinism.  There are several lex rules 
that  give non-determinism warning.  Only the parser
knows what to alternative set of tokens should be
next.

An implementation by NIST using antlr has lots of
lexers for the grammer.  I am still trying to
understand it - how they invoke various lexers etc.

Mark

---------------------------------
--- Albert Huh <albert.huh at embarcadero-ca.com> wrote:
> i totally agree.  i was just trying answer his
> question about whether it was possible.  my gut
> instincts tell me that in most situations, if you
> need to control the lexer from the parser then you
> probably need to re-evaluate your grammar.
> 
> albert
> 
> -----Original Message-----
> From: mzukowski at yci.com [mailto:mzukowski at yci.com]
> Sent: Monday, March 31, 2003 11:19 AM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
> 
> 
> That's playing with fire.  If you actually do any
> feedback from the parser
> to the lexer you have to manually guarantee that the
> tokens you want to have
> an effect on have not yet been lexed.  Your parser
> will be at least k tokens
> behind the lexer.  If you use syntactic predicates
> then you will have lexed
> every token needed to evaluate that predicate.
> 
> Monty
> 
> -----Original Message-----
> From: Albert Huh
> [mailto:albert.huh at embarcadero-ca.com]
> Sent: Wednesday, March 26, 2003 11:41 AM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
> 
> 
> actually, you can control the lexer from the parser
> to a degree.  i'm not
> sure about switching rules, but you can definitely
> make the lexer change the
> token type depending on if you've set a flag or not.
> 
> the parser will need a reference to the lexer to do
> this.
> 
> within the lexer, you can simply add some actions
> like
> 
> {
>   if (nameFlag) {
>     setType(NAME);
>   }
> }
> 
> i probably didn't use the proper method name for
> changing the type, but you
> get the idea.
> 
> though this probably isn't the greatest idea. 
> ideally the lexer and parser
> should be able to run independantly. 
> -----Original Message-----
> From: Greg Lindholm [mailto:glindholm at yahoo.com]
> Sent: Wednesday, March 26, 2003 1:12 PM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
> 
> 
> Sorry, lexers and parsers (certainly Antlr) don't
> work the way you want them
> to.  There is no facility for the parser to tell the
> lexer what tokens to
> look for.  
> The lexer acts mostly independently from the parser
> and it's job is to
> translate a stream of characters into a stream of
> tokens. (The tokens are
> then consumed by the parser. ) So the lexer has to
> be able to look at a
> sequence of characters and decide what token type to
> give it. 
> You probably need to read the Antlr documentation
> again and study some of
> the examples in order to create a workable approach
> to constructing a
> solution. 
> If you want to describe what you are trying to
> accomplish someone may be
> able to suggest an approach you can take. 
> Greg 
>  mark kant <markkant2001 at yahoo.com> wrote: 
> 1. it is not "required" distinguishing character but
> that it may be present. The point is that there is a
> list of token which are supersets of some other
> tokens.
> 
> So in your question - what type of TOKEN is 'a' ?
> The answer is that 'a' can be NAME, ID and also
> TOKEN.
> If a digit follows 'a', then a lexer can either
> return
> NAME or an ID. If I let the lexer consume the digit
> and return an ID, but the parser was expecting only
> a
> NAME, then ID would be incorrect.
> To me a parser tell lexer to find a token from a set
> of X tokens.
> In my problem there are 2 or more sets of tokens.
> The
> parser needs to tell lexer which set to look for.
> 
> Thanks again for the discussion. I will really
> appreciate any help in this.
> 
> Mark
> 
> ----------------------
> --- Greg Lindholm wrote:
> > 
> > If each token type has a "required" distinguishing
> > character then there would not be an
> > non-determiniism, but that is not what you have
> > written in the rules below.
> > Did you decide which token type an 'a' is? How
> about
> > a '9'? You're not going to get very far building a
> > lexer until you make these basic decisions.
> > Once you have some example cases, if you then have
> > trouble building the lexer to match your examples,
> > then people on this list will help you.
> > mark kant wrote:There is a
> > slight difference. Each of them also has
> > extra characters to distinguish. Example TOKEN
> also
> > has '~' character in it. If I expected an ID, but
> I
> > return TOKEN_OR_ID, then how do I know it is a
> valid
> > ID (it may have '~' in it, which makes it invalid
> > ID,
> > but valid TOKEN )
> > 
> > 
> > Mark
> > 
> > ------------------------------
> > --- Greg Lindholm wrote:
> > > 
> > > To understand the non-determinism it might help
> > you
> > > if you consider some example tokens with this
> > lexer.
> > > If your lexer sees the single character 'a' what
> > > type of token would you like it to return? One
> of
> > > the non-determinism this lexer has is that 'a'
> > > matches the NAME, ID, and TOKEN rules. Which is
> > it?
> > > Note that ANTLR doesn't care what order the
> rules
> > > appear in (unlike lex). Same thing goes with the
> > > single character '9', it matches both TOKEN and
> > > NUMBER.
> > > So I recommend work up some example cases and
> > decide
> > > what you want your lexer to return for each
> case. 
> > > In some languages a given sequence of characters
> > can
> > > mean completely different things (different
> token
> > > type) based on the context of those characters. 
> > > Antlr is basically a context-free lexer
> > (predicates
> > > can help sometimes). In these cases you might
> need
> > > to delay exact identification of the token type
> > > until you know the context (symantic analysis
> > > phase). For example you might have the lexer
> > return
> > > a token type NAME_OR_ID then later figure out
> > which
> > > it is once you know the context.
> > > Hope this helps,
> > > Greg
> > > 
> > > mark kant wrote:How about
> > > the following lexer
> > > 
> > > 
> > > protected: 
> > > ALPHA: ('a'..'z'|'A'..'Z')
> > > ;
> > > protected:
> > > ALPHA_NUM: ('a'..'z'|'A'..'Z'|'0'..'9')
> > > ;
> > > protected:
> > > DIGIT: '0'..'9'
> > > ;
> > > 
> > > 
> > > NAME: (ALPHA) ((ALPHA) | '_' | '.') )*
> 
=== message truncated ===


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list