[antlr-interest] Re: strings and vocab?

Tue Apr 13 11:11:40 PDT 2004

--- In antlr-interest at yahoogroups.com, ronald.petty at m... wrote:
> So in your example, if I have input "fake input ; some more fakeinput"
> 
> //parser
> semi :  ";"
> 
> //lexer
> SEMI    :       ';'
> 
> 
> When it reads the ";" from the input stream SEMI is matched from the Lexer 
> and not a LITERAL_;, this is because the Lexer is used first and sets the 
> Token type right?
> 
> So in the end does this mean that STRING literals are just tokens also? 

Not quite, but close.  There has to be a rule to match characters to build the token, usually called something like TEXT or ID, and the 
literal is just a retyped one of those.

> The main difference is that there is no Lexer matching rule generated? And 

Yes for 2.x.x.  3.x will probably generate matching rules.

> if there is no rule you need to test for the literal if you want to get a 
> precreated Token type, correct?  So where does the Token { } field in the 
> grammar come into play.  Is this just setting up more string literals?  If 
> so are the Tokens sections in Lexer or Parser mean the same thing?  They 
> both set up Tokens?

That is pretty much true.

--Loring

> Ugh.  I need time to read the source code of Antlr.
> 
> Thanks for helping.
> Ron
> 
> 
> 
> 
> 
> "lgcraymer" <lgc at m...> 
> 04/12/2004 06:01 PM
> Please respond to
> antlr-interest at yahoogroups.com
> 
> 
> To
> antlr-interest at yahoogroups.com
> cc
> 
> Subject
> [antlr-interest] Re: strings and vocab? 
> 
> 
> 
> 
> 
> 
> This one has to be thought of in implementation terms.  For any lexer rule 
> in which testLiterals is true:  tokens are constructed and 
> then checked against a hash table of literals.  If the table contains a 
> corresponding literal definition, then the token type is changed to 
> match the literal; if not, it is given the default token type for that 
> rule.  Note that this is independent of the parser.  I believe that the 
> current implementation requires that all literals be defined in the same 
> file as the lexer grammar.
> 
> Rules for which testLiterals=false are not checked against the hash table. 
>  So if you have a rule
> SEMI : ':' ;
> and the literal ";" in the parser grammar, you will get strange 
> results--the literal ";" has a different token type than the SEMI rule; 
> since 
> table lookup does not occur, you will never see the LITERAL_; value in the 
> parser.
> 
> --Loring
> 
> 
> --- In antlr-interest at yahoogroups.com, ronald.petty at m... wrote:
> > Alright, I give up :(.  What is the secret to Antlr, jk.  I am still 
> > having some trouble getting started with Antlr, and I believe most of my 
> 
> > confusion comes from how strings/tokens/vocab is done.
> > 
> > I was reading the java.g grammar and was wonding, in the parser there is 
> 
> > the rule
> > 
> > builtInType
> >         :       "void"
> >         |       "boolean"
> >         |       "byte"
> >         ..
> >         ;
> > 
> > Then in the Lexer there is
> > 
> > IDENT
> > options { testLiterals=true; }
> >         : 
> ('a'..'z'|'A'..'Z'|'_'|'$')('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> >         ;
> > 
> > NUM_INT
> > {boolean isDecimal=false; Token t=null;}
> >         :       '.' {_ttype=DOT;}
> >                 (       ('0'..'9')+ (EXPONENT)? (f1:FLOAT_SUFFIX 
> {t=f1;})?
> >                         {
> >                                 ......
> > 
> > protected 
> > FLOAT_SUFFIX
> >         :       'f'|'F'|'d'|'D'
> >         ;
> > 
> > 
> > When the parser says, give me next token (nextToken), the Lexer will eat 
> 
> > the next token based on the Lexer rules.  Now if the string "void" comes 
> 
> > in, the Lexer says, let me check if there is a literal yet for this 
> token. 
> >  However I do not see what is going on here.  The word "void" in the 
> > parser may not have been seen yet (calling builtinType).  I have read 
> teh 
> > vocab document, but still don't think I understand.  I have tried using 
> > tokens {} and don't understand why that works.  Could someone explain 
> > these simple concepts?  I know I am missing something very simple here. 
> I 
> > can follow along the grammars just fine, but I don't understand real 
> > workings on these issues, espically how or where you check Identifiers 
> vs. 
> > Keywords (I have read a dozen things, and none of them seem to explain 
> it 
> > in a way I can follow).
> > 
> > Also does protected mean that the Lexer will never call FLOAT_SUFFIX 
> > directly,if it is trying to get the nextToken, it will only try to get 
> it 
> > from the FLOAT_SUFFIX call in NUM_INT.  Correct?  Is this to keep 
> similiar 
> > issues like (IDENT vs Keywords) from happening?
> > 
> > Thanks Ron
> > 
> > ps.  When I get this all figured out, I will write another tutorial 
> > hopefully documenting the same issues I have, maybe help someone one day 
> 
> > :)
> > 
> > 
> **************************************************************************************
> > This communication is intended solely for the addressee and is
> > confidential. If you are not the intended recipient, any disclosure, 
> > copying, distribution or any action taken or omitted to be taken in
> > reliance on it, is prohibited and may be unlawful. Unless indicated
> > to the contrary: it does not constitute professional advice or 
> > opinions upon which reliance may be made by the addressee or any
> > other party, and it should be considered to be a work in progress.
> > 
> **************************************************************************************
> 
> 
> 
>  
> Yahoo! Groups Links
> 
> 
> 
>  
> 
> 
> 
> 
> **************************************************************************************
> This communication is intended solely for the addressee and is
> confidential. If you are not the intended recipient, any disclosure, 
> copying, distribution or any action taken or omitted to be taken in
> reliance on it, is prohibited and may be unlawful. Unless indicated
> to the contrary: it does not constitute professional advice or 
> opinions upon which reliance may be made by the addressee or any
> other party, and it should be considered to be a work in progress.
> **************************************************************************************

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/