[antlr-interest] non-determinism.

Wed Mar 26 09:02:24 PST 2003

1.  it is not "required" distinguishing character but
that it may be present.  The point is that there is a
list of token which are supersets of some other
tokens.

So in your question - what type of TOKEN is 'a' ?
The answer is that 'a' can be NAME, ID and also TOKEN.
If a digit follows 'a', then a lexer can either return
NAME or an ID.  If I let the lexer consume the digit
and return an ID, but the parser was expecting only a
NAME, then ID would be incorrect.
To me a parser tell lexer to find a token from a set
of X tokens.
In my problem there are 2 or more sets of tokens. The
parser needs to tell lexer which set to look for.

Thanks again for the discussion.  I will really
appreciate any help in this.

Mark

----------------------
--- Greg Lindholm <glindholm at yahoo.com> wrote:
> 
> If each token type has a "required" distinguishing
> character then there would not be an
> non-determiniism, but that is not what you have
> written in the rules below.
> Did you decide which token type an 'a' is? How about
> a '9'? You're not going to get very far building a
> lexer until you make these basic decisions.
> Once you have some example cases, if you then have
> trouble building the lexer to match your examples,
> then people on this list will help you.
>  mark kant <markkant2001 at yahoo.com> wrote:There is a
> slight difference. Each of them also has
> extra characters to distinguish. Example TOKEN also
> has '~' character in it. If I expected an ID, but I
> return TOKEN_OR_ID, then how do I know it is a valid
> ID (it may have '~' in it, which makes it invalid
> ID,
> but valid TOKEN )
> 
> 
> Mark
> 
> ------------------------------
> --- Greg Lindholm wrote:
> > 
> > To understand the non-determinism it might help
> you
> > if you consider some example tokens with this
> lexer.
> > If your lexer sees the single character 'a' what
> > type of token would you like it to return? One of
> > the non-determinism this lexer has is that 'a'
> > matches the NAME, ID, and TOKEN rules. Which is
> it?
> > Note that ANTLR doesn't care what order the rules
> > appear in (unlike lex). Same thing goes with the
> > single character '9', it matches both TOKEN and
> > NUMBER.
> > So I recommend work up some example cases and
> decide
> > what you want your lexer to return for each case. 
> > In some languages a given sequence of characters
> can
> > mean completely different things (different token
> > type) based on the context of those characters. 
> > Antlr is basically a context-free lexer
> (predicates
> > can help sometimes). In these cases you might need
> > to delay exact identification of the token type
> > until you know the context (symantic analysis
> > phase). For example you might have the lexer
> return
> > a token type NAME_OR_ID then later figure out
> which
> > it is once you know the context.
> > Hope this helps,
> > Greg
> > 
> > mark kant wrote:How about
> > the following lexer
> > 
> > 
> > protected: 
> > ALPHA: ('a'..'z'|'A'..'Z')
> > ;
> > protected:
> > ALPHA_NUM: ('a'..'z'|'A'..'Z'|'0'..'9')
> > ;
> > protected:
> > DIGIT: '0'..'9'
> > ;
> > 
> > 
> > NAME: (ALPHA) ((ALPHA) | '_' | '.') )*
> > ;
> > 
> > ID: (ALPHA) ( (ALPHA_NUM) |'_'|'.'|'@')*
> > ;
> > 
> > TOKEN: (ALPHANUM|'_'|'.'|'@'|'%'|';'|'~')+
> > ;
> > 
> > NUMBER: ( DIGITS )+
> > ;
> > 
> > 
> > Thanks
> > 
> > Mark
> > --- mzukowski at yci.com wrote:
> > > remove your AT rule and then add a literal
> keyword
> > > AT='@' to the keywords
> > > section and test for it in TOKEN by turning on
> the
> > > option testLiterals=true.
> > > See the docs on literals.
> > > 
> > > Monty
> > > 
> > > -----Original Message-----
> > > From: mark kant [mailto:markkant2001 at yahoo.com]
> > > Sent: Tuesday, March 25, 2003 9:42 AM
> > > To: antlr-interest at yahoogroups.com
> > > Subject: [antlr-interest] non-determinism.
> > > 
> > > 
> > > Hi,
> > > 
> > > I get non-determinism in the following lexer
> > > (relevant
> > > portion of parser and lexer)
> > > 
> > > hosport: host COLON password
> > > 
> > > password: TOKEN
> > > 
> > > host: NAME AT TOKEN
> > > 
> > > 
> > > lexer ...............
> > > 
> > > COLON: ':'
> > > 
> > > SEMI: ';'
> > > 
> > > AT: '@'
> > > 
> > > TOKEN: ('a'..'z' | 'A'..'Z'
> > > |'0'..'9'|'.'|':'|';'|'@')+
> > > 
> > > 
> > > What is the best way to resolve it:
> > > 1. multiple lexers
> > > 2. syntactic predicates - not appropriate as I
> > have
> > > other similar rules for special characters
> > > 3. some kind of flag set in parser and lexer
> > checks
> > > it
> > > before matching a rule in lexer (how do I
> > > communicate
> > > the flag state from parser to lexer). I have
> done
> > > this
> > > in Lex and YAcc.
> > > 
> > > Thanks
> > > 
> > > Mark
> > > 
> > >
> __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Platinum - Watch CBS' NCAA March Madness,
> > > live on your desktop!
> > > http://platinum.yahoo.com
> > > 
> > > 
> > > 
> > > Your use of Yahoo! Groups is subject to
> > > http://docs.yahoo.com/info/terms/ 
> > > 
> > > 
> > > 
> > > 
> > > Your use of Yahoo! Groups is subject to
> > > http://docs.yahoo.com/info/terms/ 
> > > 
> > > 
> > 
> > 
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Platinum - Watch CBS' NCAA March Madness,
> > live on your desktop!
> > http://platinum.yahoo.com
> > 
> > 
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/ 
> > 
> > 
> > 
> > 
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Platinum - Watch CBS' NCAA March Madness,
> > live on your desktop!
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness,
> live on your desktop!
> http://platinum.yahoo.com
> 
> 
> 
> 
=== message truncated ===

__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/