[antlr-interest] Troubles lexing a decimal, (from an antlr beginner)

Igor Murashkin downtown1 at gmail.com
Tue Jul 24 16:42:02 PDT 2007


Hello,

Thanks for all the help. I used a syntactic predicate like Jim suggested and
it seems to lex everything properly now. I wish I understood  more
academically why my original lexing syntax didn't work, does ANTLR not put
the tokens back and backtrack when it fails to match a rule?

In regards to seeing the answer to this earlier, I couldn't find a search
option on the mailing list, and using Google I kept coming across ANTLRv2
threads which I was afraid to look at since so much ANTLRv2 information on
the internet would break on me (like using ! to exclude things from the
token text).

Thanks,
Igor Murashkin

On 7/24/07, Jim Idle <jimi at temporal-wave.com> wrote:
>
>  Igor,
>
>
>
> This questions was asked and answered just a few days ago:
>
>
>
> I think that this question points out that many of us expect ATNLR to
> "just work it out" for us. All these problems are best solved with a mind
> experiment first "How would you scan it with the eye?", then break the rule
> at the different alternatives yourself and stick in the lookahead you
> perform with your mind. It will result in better generated code anyway:
>
>
>
> grammar fred;
>
>
>
> stat
>
>             : test+
>
>             ;
>
> test
>
>             :           (INT DOT ID)
>
>             |           FLOAT
>
>             ;
>
>
>
> fragment
>
> DIGIT    : '0'..'9'
>
>             ;
>
>
>
> FLOAT : INT
>
>                                     (
>
>                                                   ('.' INT)=> '.' INT
>
>                                                 | {$type = INT; }
>
>                                     )
>
>                         ;
>
>
>
> DOT     : '.' ;
>
>
>
> Fragment                                  // Also ensures a token type INT
> is present
>
> INT       : DIGIT+;
>
>
>
> ID         :           ('A'..'Z' | 'a'..'z')+
>
>             ;
>
>
>
> Jim
>
>
>
> *From:* antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] *On Behalf Of *Igor Murashkin
> *Sent:* Tuesday, July 24, 2007 9:45 AM
> *To:* antlr-interest at antlr.org
> *Subject:* [antlr-interest] Troubles lexing a decimal, (from an antlr
> beginner)
>
>
>
> Hello,
>
> Well let me just say, its my first time using ANTLR. I needed a C# parser
> generator so using flex/bison as I have done before was simply out of the
> question, and I figured learning an LL(k) parser should be a nice variation
> to just using LR(k).
>
> Unfortunately before I can even get to the parsing, I need to fix my
> lexing.. right now it doesn't work for matching decimals properly. Here are
> the lexing rules in question:
>
> ===============
>
> DOT        : '.'   ;
> INTEGER    :    Digit+;
> DECIMAL    :    Digit+ '.' Digit+;
> fragment Digit
>     :    '0'..'9';
> IDENT    :     ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>
> NL    :    ('\r\n' // DOS/Windows
>     |     '\r'  // Macintosh
>           |     '\n') // Unix
>           { $channel=HIDDEN; };
>
> WS
>       :     (' '
>         |     '\t'
>         |     '\f')
>         { $channel=HIDDEN; };
>
> ===============
>
> Unfortunately with simple output such as this it crashes with an
> EarlyExitException:
>
> ===============
> console.flushBuffer
> general.holdMsec 1000
> object 1.doSomeAction withThis
> ===============
> The third line should produce "IDENT INTEGER DOT IDENT IDENT" but instead
> it tries to match "1." as a DECIMAL and then once it sees the "d" it fails
> and throws an EarlyExitException.
>
> I am completely unsure what is going on.. I tried to set k=2 in options
> figuring that if it looked at the period AND the next character it would get
> a ('.' , 'd') clearly that does not match the DECIMAL rule.. but then I just
> got a bunch of warnings in my lexer grammar so I removed the k=2 line
> altogether. Looking at the generated code though its always calling LA(1)
> and maybe there should be a way to get it to call LA(2) ?
>
> Probably I am completely misunderstanding how the whole process of lexing
> is working too. Looking at the generated code it is generating some DFAs,
> which would imply some kind of regular language being at work here? Or does
> it still use LL(k) parsing even for lexing?
>
> I'm going to try to get the book asap too, probably it explains some of
> this...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070724/e03e83ca/attachment.html 


More information about the antlr-interest mailing list