[antlr-interest] Re: strings and vocab?

ronald.petty at milliman.com ronald.petty at milliman.com
Tue Apr 13 08:05:57 PDT 2004


So in your example, if I have input "fake input ; some more fakeinput"

//parser
semi :  ";"

//lexer
SEMI    :       ';'


When it reads the ";" from the input stream SEMI is matched from the Lexer 
and not a LITERAL_;, this is because the Lexer is used first and sets the 
Token type right?

So in the end does this mean that STRING literals are just tokens also? 
The main difference is that there is no Lexer matching rule generated? And 
if there is no rule you need to test for the literal if you want to get a 
precreated Token type, correct?  So where does the Token { } field in the 
grammar come into play.  Is this just setting up more string literals?  If 
so are the Tokens sections in Lexer or Parser mean the same thing?  They 
both set up Tokens?

Ugh.  I need time to read the source code of Antlr.

Thanks for helping.
Ron





"lgcraymer" <lgc at mail1.jpl.nasa.gov> 
04/12/2004 06:01 PM
Please respond to
antlr-interest at yahoogroups.com


To
antlr-interest at yahoogroups.com
cc

Subject
[antlr-interest] Re: strings and vocab? 






This one has to be thought of in implementation terms.  For any lexer rule 
in which testLiterals is true:  tokens are constructed and 
then checked against a hash table of literals.  If the table contains a 
corresponding literal definition, then the token type is changed to 
match the literal; if not, it is given the default token type for that 
rule.  Note that this is independent of the parser.  I believe that the 
current implementation requires that all literals be defined in the same 
file as the lexer grammar.

Rules for which testLiterals=false are not checked against the hash table. 
 So if you have a rule
SEMI : ':' ;
and the literal ";" in the parser grammar, you will get strange 
results--the literal ";" has a different token type than the SEMI rule; 
since 
table lookup does not occur, you will never see the LITERAL_; value in the 
parser.

--Loring


--- In antlr-interest at yahoogroups.com, ronald.petty at m... wrote:
> Alright, I give up :(.  What is the secret to Antlr, jk.  I am still 
> having some trouble getting started with Antlr, and I believe most of my 

> confusion comes from how strings/tokens/vocab is done.
> 
> I was reading the java.g grammar and was wonding, in the parser there is 

> the rule
> 
> builtInType
>         :       "void"
>         |       "boolean"
>         |       "byte"
>         ..
>         ;
> 
> Then in the Lexer there is
> 
> IDENT
> options { testLiterals=true; }
>         : 
('a'..'z'|'A'..'Z'|'_'|'$')('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
>         ;
> 
> NUM_INT
> {boolean isDecimal=false; Token t=null;}
>         :       '.' {_ttype=DOT;}
>                 (       ('0'..'9')+ (EXPONENT)? (f1:FLOAT_SUFFIX 
{t=f1;})?
>                         {
>                                 ......
> 
> protected 
> FLOAT_SUFFIX
>         :       'f'|'F'|'d'|'D'
>         ;
> 
> 
> When the parser says, give me next token (nextToken), the Lexer will eat 

> the next token based on the Lexer rules.  Now if the string "void" comes 

> in, the Lexer says, let me check if there is a literal yet for this 
token. 
>  However I do not see what is going on here.  The word "void" in the 
> parser may not have been seen yet (calling builtinType).  I have read 
teh 
> vocab document, but still don't think I understand.  I have tried using 
> tokens {} and don't understand why that works.  Could someone explain 
> these simple concepts?  I know I am missing something very simple here. 
I 
> can follow along the grammars just fine, but I don't understand real 
> workings on these issues, espically how or where you check Identifiers 
vs. 
> Keywords (I have read a dozen things, and none of them seem to explain 
it 
> in a way I can follow).
> 
> Also does protected mean that the Lexer will never call FLOAT_SUFFIX 
> directly,if it is trying to get the nextToken, it will only try to get 
it 
> from the FLOAT_SUFFIX call in NUM_INT.  Correct?  Is this to keep 
similiar 
> issues like (IDENT vs Keywords) from happening?
> 
> Thanks Ron
> 
> ps.  When I get this all figured out, I will write another tutorial 
> hopefully documenting the same issues I have, maybe help someone one day 

> :)
> 
> 
**************************************************************************************
> This communication is intended solely for the addressee and is
> confidential. If you are not the intended recipient, any disclosure, 
> copying, distribution or any action taken or omitted to be taken in
> reliance on it, is prohibited and may be unlawful. Unless indicated
> to the contrary: it does not constitute professional advice or 
> opinions upon which reliance may be made by the addressee or any
> other party, and it should be considered to be a work in progress.
> 
**************************************************************************************



 
Yahoo! Groups Links



 




**************************************************************************************
This communication is intended solely for the addressee and is
confidential. If you are not the intended recipient, any disclosure, 
copying, distribution or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. Unless indicated
to the contrary: it does not constitute professional advice or 
opinions upon which reliance may be made by the addressee or any
other party, and it should be considered to be a work in progress.
**************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20040413/40ed4954/attachment.html


More information about the antlr-interest mailing list