[antlr-interest] non-determinism.

Greg Lindholm glindholm at yahoo.com
Tue Mar 25 18:50:26 PST 2003


To understand the non-determinism it might help you if you consider some example tokens with this lexer.
If your lexer sees the single character 'a' what type of token would you like it to return?  One of the non-determinism this lexer has is that 'a' matches the NAME, ID, and TOKEN rules.  Which is it? Note that ANTLR doesn't care what order the rules appear in (unlike lex).  Same thing goes with the single character '9', it matches both TOKEN and NUMBER.
So I recommend work up some example cases and decide what you want your lexer to return for each case.   
In some languages a given sequence of characters can mean completely different things (different token type) based on the context of those characters.  Antlr is basically a context-free lexer (predicates can help sometimes).  In these cases you might need to delay exact identification of the token type until you know the context (symantic analysis phase).  For example you might have the lexer return a token type NAME_OR_ID  then later figure out which it is once you know the context.
Hope this helps,
Greg
 
 mark kant <markkant2001 at yahoo.com> wrote:How about the following lexer


protected: 
ALPHA: ('a'..'z'|'A'..'Z')
;
protected:
ALPHA_NUM: ('a'..'z'|'A'..'Z'|'0'..'9')
;
protected:
DIGIT: '0'..'9'
;


NAME: (ALPHA) ((ALPHA) | '_' | '.') )*
;

ID: (ALPHA) ( (ALPHA_NUM) |'_'|'.'|'@')*
;

TOKEN: (ALPHANUM|'_'|'.'|'@'|'%'|';'|'~')+
;

NUMBER: ( DIGITS )+
;


Thanks

Mark
--- mzukowski at yci.com wrote:
> remove your AT rule and then add a literal keyword
> AT='@' to the keywords
> section and test for it in TOKEN by turning on the
> option testLiterals=true.
> See the docs on literals.
> 
> Monty
> 
> -----Original Message-----
> From: mark kant [mailto:markkant2001 at yahoo.com]
> Sent: Tuesday, March 25, 2003 9:42 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] non-determinism.
> 
> 
> Hi,
> 
> I get non-determinism in the following lexer
> (relevant
> portion of parser and lexer)
> 
> hosport: host COLON password
> 
> password: TOKEN
> 
> host: NAME AT TOKEN
> 
> 
> lexer ...............
> 
> COLON: ':'
> 
> SEMI: ';'
> 
> AT: '@'
> 
> TOKEN: ('a'..'z' | 'A'..'Z'
> |'0'..'9'|'.'|':'|';'|'@')+
> 
> 
> What is the best way to resolve it:
> 1. multiple lexers
> 2. syntactic predicates - not appropriate as I have
> other similar rules for special characters
> 3. some kind of flag set in parser and lexer checks
> it
> before matching a rule in lexer (how do I
> communicate
> the flag state from parser to lexer). I have done
> this
> in Lex and YAcc.
> 
> Thanks
> 
> Mark
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness,
> live on your desktop!
> http://platinum.yahoo.com
> 
> 
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/ 
> 
> 
> 
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/ 
> 
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com



Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20030325/951fe507/attachment.html


More information about the antlr-interest mailing list