[antlr-interest] Troubles lexing a decimal, (from an antlr beginner)
Igor Murashkin
downtown1 at gmail.com
Tue Jul 24 16:42:02 PDT 2007
Hello,
Thanks for all the help. I used a syntactic predicate like Jim suggested and
it seems to lex everything properly now. I wish I understood more
academically why my original lexing syntax didn't work, does ANTLR not put
the tokens back and backtrack when it fails to match a rule?
In regards to seeing the answer to this earlier, I couldn't find a search
option on the mailing list, and using Google I kept coming across ANTLRv2
threads which I was afraid to look at since so much ANTLRv2 information on
the internet would break on me (like using ! to exclude things from the
token text).
Thanks,
Igor Murashkin
On 7/24/07, Jim Idle <jimi at temporal-wave.com> wrote:
>
> Igor,
>
>
>
> This questions was asked and answered just a few days ago:
>
>
>
> I think that this question points out that many of us expect ATNLR to
> "just work it out" for us. All these problems are best solved with a mind
> experiment first "How would you scan it with the eye?", then break the rule
> at the different alternatives yourself and stick in the lookahead you
> perform with your mind. It will result in better generated code anyway:
>
>
>
> grammar fred;
>
>
>
> stat
>
> : test+
>
> ;
>
> test
>
> : (INT DOT ID)
>
> | FLOAT
>
> ;
>
>
>
> fragment
>
> DIGIT : '0'..'9'
>
> ;
>
>
>
> FLOAT : INT
>
> (
>
> ('.' INT)=> '.' INT
>
> | {$type = INT; }
>
> )
>
> ;
>
>
>
> DOT : '.' ;
>
>
>
> Fragment // Also ensures a token type INT
> is present
>
> INT : DIGIT+;
>
>
>
> ID : ('A'..'Z' | 'a'..'z')+
>
> ;
>
>
>
> Jim
>
>
>
> *From:* antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] *On Behalf Of *Igor Murashkin
> *Sent:* Tuesday, July 24, 2007 9:45 AM
> *To:* antlr-interest at antlr.org
> *Subject:* [antlr-interest] Troubles lexing a decimal, (from an antlr
> beginner)
>
>
>
> Hello,
>
> Well let me just say, its my first time using ANTLR. I needed a C# parser
> generator so using flex/bison as I have done before was simply out of the
> question, and I figured learning an LL(k) parser should be a nice variation
> to just using LR(k).
>
> Unfortunately before I can even get to the parsing, I need to fix my
> lexing.. right now it doesn't work for matching decimals properly. Here are
> the lexing rules in question:
>
> ===============
>
> DOT : '.' ;
> INTEGER : Digit+;
> DECIMAL : Digit+ '.' Digit+;
> fragment Digit
> : '0'..'9';
> IDENT : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>
> NL : ('\r\n' // DOS/Windows
> | '\r' // Macintosh
> | '\n') // Unix
> { $channel=HIDDEN; };
>
> WS
> : (' '
> | '\t'
> | '\f')
> { $channel=HIDDEN; };
>
> ===============
>
> Unfortunately with simple output such as this it crashes with an
> EarlyExitException:
>
> ===============
> console.flushBuffer
> general.holdMsec 1000
> object 1.doSomeAction withThis
> ===============
> The third line should produce "IDENT INTEGER DOT IDENT IDENT" but instead
> it tries to match "1." as a DECIMAL and then once it sees the "d" it fails
> and throws an EarlyExitException.
>
> I am completely unsure what is going on.. I tried to set k=2 in options
> figuring that if it looked at the period AND the next character it would get
> a ('.' , 'd') clearly that does not match the DECIMAL rule.. but then I just
> got a bunch of warnings in my lexer grammar so I removed the k=2 line
> altogether. Looking at the generated code though its always calling LA(1)
> and maybe there should be a way to get it to call LA(2) ?
>
> Probably I am completely misunderstanding how the whole process of lexing
> is working too. Looking at the generated code it is generating some DFAs,
> which would imply some kind of regular language being at work here? Or does
> it still use LL(k) parsing even for lexing?
>
> I'm going to try to get the book asap too, probably it explains some of
> this...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070724/e03e83ca/attachment.html
More information about the antlr-interest
mailing list