[antlr-interest] Troubles lexing a decimal, (from an antlr beginner)

Wed Jul 25 12:11:44 PDT 2007

So if ANTLR generates recursive descent lexers, why are there so many things
about DFAs in the generated code (shouldn't I be seeing PDAs all over the
place instead)? When I see that I automatically think regular language +
NFAs and that means it should be able to automagically pick the right
accept/reject state for the tokens.

As for looking at the wiki, I did, and there was a Java example with setText
and getText (if you mean about the !), which aren't applicable to C# since
we have a Text setter/getter property instead. For parsing decimals I don't
recall anything like this problem unless it was hidden in one of the
tutorials.

Thanks,
Igor

On 7/25/07, Jim Idle <jimi at temporal-wave.com> wrote:
>
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Johannes Luber
> > Sent: Wednesday, July 25, 2007 2:34 AM
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Troubles lexing a decimal,(from an antlr
> > beginner)
> >
> > Igor Murashkin wrote:
> > > Hello,
> > >
> > > Thanks for all the help. I used a syntactic predicate like Jim
> > suggested
> > > and it seems to lex everything properly now. I wish I understood
> > more
> > > academically why my original lexing syntax didn't work, does ANTLR
> > not
> > > put the tokens back and backtrack when it fails to match a rule?
> >
> > Backtracking has to be explicitly activated because this option is
> more
> > time consuming than a straight pass.
>
> This was a lexing question. Igor is asking why ANTLR does not generate
> code that acts like {f}lex in that you can get through a matching
> sequence and then decide to YYREJECT; manually or the algorithm will
> give up and try the next rule and so on.
>
> ANTLR generates recursive descent recognizers and so there is no [neat]
> way to pop back up the recognition chain and start again. In practice,
> this just means you have to get your head around it until you have
> expunged {f}lex from your brain. It creates some lexing problems which
> are difficult to solve until you have the gestalt.
>
> The easiest way is look at your tokens, merge common roots and write the
> lexing rule so that it branches where the tokens will differ then uses
> an action to set the type. You don't need to go to this trouble for
> keywords with common roots 'call' 'calling' etc, but when you are
> constructing compounds like INT.INT in the lexer and INT.xxx can mean
> something else, then you need to guide the lexer analysis a bit. It may
> not be exactly intuitive (at least not at first) but if you start
> looking at the generated code, then as a programmer it may help you to
> see what is happening, even if you don't a have a firm grasp of the
> theory.
>
> Ter has recently stated that he may look at the algorithm in order to
> make it generate some of the 'intuitive' cases as one might expect. Of
> course, that will screw up those of us that have got used to the way it
> is ;-)
>
> Jim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070725/95439c71/attachment.html