[antlr-interest] lexer troubles in grammar

ronald.petty at milliman.com ronald.petty at milliman.com
Wed Apr 7 12:31:49 PDT 2004


When I run this grammar I get the following

$ javac *.java
Note: * uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

ron at asdf~
$ java t
sub function
line 1:5: expecting LETTER, found 'function'

I was wondering how this is happening.  It appears to me that Antlr's 
parser (well mine since I specified it :( ) should do the following.  My 
driver does parser.start(); 

1) The parser choices the first alternative to the rule start
2) The parser sees there is another parser rule called (sub) so it goes 
there (has to pick one and try it to failure I assume, before it tries 
function)
3) When you get there, the parser says to the lexer I need a token that 
matches your rule for "SUB"
4) The lexer tries the next token on that rule and see if any alternatives 
match the lexer rule for SUB and the token "sub"
5) it does, and returns ok to the parser
6) the parser does the next rule which is check for WS (whitespace) and 
repeats 3-5
7) Now since we are on the parser rule sub, it goes on and checks for a 
parser rule called id
8) the parser rule id should just return assuming the word function is the 
ID found in the lexer, but for some reason it is going to function.

I must be missing some fundimental point here.  Doesn't Antlr parsers just 
go down the rules?  I probably have the rules wrong.  I assume since the 
start rule doesn't have (start)+ around it, that once it matches
either a sub | function it will end the program (well assuming you don't 
make an infinitely long ID).  How come I can do

sub asdf
sub asdf

and then it exits?

Does my question make sense?  I think this is my fault but not sure!

Thanks for the Help (drowning in compilers :) )

Ron


class TParser extends Parser;

options
{
        exportVocab=TVocab;
        k = 1;
}

start   :
        (sub) | (function)
        ;

sub     :
        SUB WS id
        ;

function        :
                FUNCTION WS id
                ;

id      :
        LETTER ( NUMBER | LETTER | UNDERSCORE )*
        ;

class TLexer extends Lexer;

options
{
        k = 2;
        exportVocab=TVocab;
        caseSensitive=false;
        charVocabulary = '\3'..'\377';
}

LETTER  :
        'a'..'z'
        ;

SUB     :
        "sub"
        ;

FUNCTION        :
                "function"
                ;

NUMBER  :
        '0'..'9'
        ;

UNDERSCORE      :
                '_'
                ;

WS      :
        (
        options { generateAmbigWarnings=false; }
        :       ' '
        |       '\t'
        |       '\n'
        |       "\r\n"
        |       '\r'
        )+
        ;

**************************************************************************************
This communication is intended solely for the addressee and is
confidential. If you are not the intended recipient, any disclosure, 
copying, distribution or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. Unless indicated
to the contrary: it does not constitute professional advice or 
opinions upon which reliance may be made by the addressee or any
other party, and it should be considered to be a work in progress.
**************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20040407/eadc47b6/attachment.html


More information about the antlr-interest mailing list