[antlr-interest] Re: strings and vocab?

lgcraymer lgc at mail1.jpl.nasa.gov
Mon Apr 12 16:01:49 PDT 2004


This one has to be thought of in implementation terms.  For any lexer rule in which testLiterals is true:  tokens are constructed and 
then checked against a hash table of literals.  If the table contains a corresponding literal definition, then the token type is changed to 
match the literal; if not, it is given the default token type for that rule.  Note that this is independent of the parser.  I believe that the 
current implementation requires that all literals be defined in the same file as the lexer grammar.

Rules for which testLiterals=false are not checked against the hash table.  So if you have a rule
SEMI : ':' ;
and the literal ";" in the parser grammar, you will get strange results--the literal ";" has a different token type than the SEMI rule; since 
table lookup does not occur, you will never see the LITERAL_; value in the parser.

--Loring


--- In antlr-interest at yahoogroups.com, ronald.petty at m... wrote:
> Alright, I give up :(.  What is the secret to Antlr, jk.  I am still 
> having some trouble getting started with Antlr, and I believe most of my 
> confusion comes from how strings/tokens/vocab is done.
> 
> I was reading the java.g grammar and was wonding, in the parser there is 
> the rule
> 
> builtInType
>         :       "void"
>         |       "boolean"
>         |       "byte"
>         ..
>         ;
> 
> Then in the Lexer there is
> 
> IDENT
> options { testLiterals=true; }
>         : ('a'..'z'|'A'..'Z'|'_'|'$')('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
>         ;
> 
> NUM_INT
> {boolean isDecimal=false; Token t=null;}
>         :       '.' {_ttype=DOT;}
>                 (       ('0'..'9')+ (EXPONENT)? (f1:FLOAT_SUFFIX {t=f1;})?
>                         {
>                                 ......
> 
> protected 
> FLOAT_SUFFIX
>         :       'f'|'F'|'d'|'D'
>         ;
> 
> 
> When the parser says, give me next token (nextToken), the Lexer will eat 
> the next token based on the Lexer rules.  Now if the string "void" comes 
> in, the Lexer says, let me check if there is a literal yet for this token. 
>  However I do not see what is going on here.  The word "void" in the 
> parser may not have been seen yet (calling builtinType).  I have read teh 
> vocab document, but still don't think I understand.  I have tried using 
> tokens {} and don't understand why that works.  Could someone explain 
> these simple concepts?  I know I am missing something very simple here.  I 
> can follow along the grammars just fine, but I don't understand real 
> workings on these issues, espically how or where you check Identifiers vs. 
> Keywords (I have read a dozen things, and none of them seem to explain it 
> in a way I can follow).
> 
> Also does protected mean that the Lexer will never call FLOAT_SUFFIX 
> directly,if it is trying to get the nextToken, it will only try to get it 
> from the FLOAT_SUFFIX call in NUM_INT.  Correct?  Is this to keep similiar 
> issues like (IDENT vs Keywords) from happening?
> 
> Thanks Ron
> 
> ps.  When I get this all figured out, I will write another tutorial 
> hopefully documenting the same issues I have, maybe help someone one day 
> :)
> 
> **************************************************************************************
> This communication is intended solely for the addressee and is
> confidential. If you are not the intended recipient, any disclosure, 
> copying, distribution or any action taken or omitted to be taken in
> reliance on it, is prohibited and may be unlawful. Unless indicated
> to the contrary: it does not constitute professional advice or 
> opinions upon which reliance may be made by the addressee or any
> other party, and it should be considered to be a work in progress.
> **************************************************************************************



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list