[antlr-interest] non-determinism.

Wed Mar 26 11:50:30 PST 2003

Hi,

This is what I was looking for, but how do I declare
the flag, so that I can set it parser and check it in
lexer ?

Thanks

Mark

------------------
--- Albert Huh <albert.huh at embarcadero-ca.com> wrote:
> actually, you can control the lexer from the parser
> to a degree.  i'm not sure about switching rules,
> but you can definitely make the lexer change the
> token type depending on if you've set a flag or not.
>  
> the parser will need a reference to the lexer to do
> this.
>  
> within the lexer, you can simply add some actions
> like
>  
> {
>   if (nameFlag) {
>     setType(NAME);
>   }
> }
>  
> i probably didn't use the proper method name for
> changing the type, but you get the idea.
>  
> though this probably isn't the greatest idea. 
> ideally the lexer and parser should be able to run
> independantly. 
> 
> -----Original Message-----
> From: Greg Lindholm [mailto:glindholm at yahoo.com]
> Sent: Wednesday, March 26, 2003 1:12 PM
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] non-determinism.
> 
> 
> 
> Sorry, lexers and parsers (certainly Antlr) don't
> work the way you want them to.  There is no facility
> for the parser to tell the lexer what tokens to look
> for.  
> 
> 
> The lexer acts mostly independently from the parser
> and it's job is to translate a stream of characters
> into a stream of tokens. (The tokens are then
> consumed by the parser. ) So the lexer has to be
> able to look at a sequence of characters and decide
> what token type to give it. 
> 
> 
> You probably need to read the Antlr documentation
> again and study some of the examples in order to
> create a workable approach to constructing a
> solution. 
> 
> 
> If you want to describe what you are trying to
> accomplish someone may be able to suggest an
> approach you can take. 
> 
> 
> Greg 
> 
> 
>  mark kant <markkant2001 at yahoo.com> wrote: 
> 
> 
> 1. it is not "required" distinguishing character but
> that it may be present. The point is that there is a
> list of token which are supersets of some other
> tokens.
> 
> So in your question - what type of TOKEN is 'a' ?
> The answer is that 'a' can be NAME, ID and also
> TOKEN.
> If a digit follows 'a', then a lexer can either
> return
> NAME or an ID. If I let the lexer consume the digit
> and return an ID, but the parser was expecting only
> a
> NAME, then ID would be incorrect.
> To me a parser tell lexer to find a token from a set
> of X tokens.
> In my problem there are 2 or more sets of tokens.
> The
> parser needs to tell lexer which set to look for.
> 
> Thanks again for the discussion. I will really
> appreciate any help in this.
> 
> Mark
> 
> ----------------------
> --- Greg Lindholm wrote:
> > 
> > If each token type has a "required" distinguishing
> > character then there would not be an
> > non-determiniism, but that is not what you have
> > written in the rules below.
> > Did you decide which token type an 'a' is? How
> about
> > a '9'? You're not going to get very far building a
> > lexer until you make these basic decisions.
> > Once you have some example cases, if you then have
> > trouble building the lexer to match your examples,
> > then people on this list will help you.
> > mark kant wrote:There is a
> > slight difference. Each of them also has
> > extra characters to distinguish. Example TOKEN
> also
> > has '~' character in it. If I expected an ID, but
> I
> > return TOKEN_OR_ID, then how do I know it is a
> valid
> > ID (it may have '~' in it, which makes it invalid
> > ID,
> > but valid TOKEN )
> > 
> > 
> > Mark
> > 
> > ------------------------------
> > --- Greg Lindholm wrote:
> > > 
> > > To understand the non-determinism it might help
> > you
> > > if you consider some example tokens with this
> > lexer.
> > > If your lexer sees the single character 'a' what
> > > type of token would you like it to return? One
> of
> > > the non-determinism this lexer has is that 'a'
> > > matches the NAME, ID, and TOKEN rules. Which is
> > it?
> > > Note that ANTLR doesn't care what order the
> rules
> > > appear in (unlike lex). Same thing goes with the
> > > single character '9', it matches both TOKEN and
> > > NUMBER.
> > > So I recommend work up some example cases and
> > decide
> > > what you want your lexer to return for each
> case. 
> > > In some languages a given sequence of characters
> > can
> > > mean completely different things (different
> token
> > > type) based on the context of those characters. 
> > > Antlr is basically a context-free lexer
> > (predicates
> > > can help sometimes). In these cases you might
> need
> > > to delay exact identification of the token type
> > > until you know the context (symantic analysis
> > > phase). For example you might have the lexer
> > return
> > > a token type NAME_OR_ID then later figure out
> > which
> > > it is once you know the context.
> > > Hope this helps,
> > > Greg
> > > 
> > > mark kant wrote:How about
> > > the following lexer
> > > 
> > > 
> > > protected: 
> > > ALPHA: ('a'..'z'|'A'..'Z')
> > > ;
> > > protected:
> > > ALPHA_NUM: ('a'..'z'|'A'..'Z'|'0'..'9')
> > > ;
> > > protected:
> > > DIGIT: '0'..'9'
> > > ;
> > > 
> > > 
> > > NAME: (ALPHA) ((ALPHA) | '_' | '.') )*
> > > ;
> > > 
> > > ID: (ALPHA) ( (ALPHA_NUM) |'_'|'.'|'@')*
> > > ;
> > > 
> > > TOKEN: (ALPHANUM|'_'|'.'|'@'|'%'|';'|'~')+
> > > ;
> > > 
> > > NUMBER: ( DIGITS )+
> > > ;
> > > 
> > > 
> > > Thanks
> > > 
> > > Mark
> > > --- mzukowski at yci.com wrote:
> > > > remove your AT rule and then add a literal
> > keyword
> > > > AT='@' to the keywords
> > > > section and test for it in TOKEN by turning on
> > the
> > > > option testLiterals=true.
> > > > See the docs on literals.
> > > > 
> > > > Monty
> > > > 
> > > > -----Original Message-----
> 
=== message truncated ===

__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/