[antlr-interest] greedy/non-greedy problem.

Bharath bharath at starthis.com
Tue May 18 07:35:54 PDT 2004


Hi Antlers,

I have a parser rule of this form:

myParserRule: IDENT DOT IDENT DOT (IDENT DOT)* IDENT;

I get a non-determinism between alt 1 and exit branch of block, as I
expected. If I change the rule to this form: 

myParserRule: IDENT DOT IDENT DOT (options{greedy=false;}:IDENT DOT)* IDENT;

This works as if (IDENT DOT)* is really (IDENT DOT). It cannot take multiple
(IDENT DOT) entries in the input. Eg. a.b.c.D.e wont work whereas a.b.c.E
will work.

If I change the rule to

myParserRule: IDENT DOT IDENT DOT (options{greedy=true;}:IDENT DOT)* IDENT;

I would get an error saying "expecting DOT found NEWLINE" because ANTLR
greedily consumes the IDENT and gets upset when DOT doesn't follow IDENT.

------
In examples with comments, I can apply 
"(*" 	(options{greedy=false;}:.)* 	"*)" 
and it works because DOT and "*)" have no similarity unlike IDENT and IDENT
DOT. When the parser sees "*" it breaks out of the loop and all is well.

How can I work around this problem? Please feel free to comment.

Bharath.

-----Original Message-----
From: Robert Colquhoun [mailto:rjc at trump.net.au] 
Sent: Thursday, May 13, 2004 8:49 PM
To: antlr-interest at yahoogroups.com
Cc: Monty Zukowski
Subject: Re: [antlr-interest] simple parser lookahead problem

At 02:46 AM 14/05/2004, Monty Zukowski wrote:
>On May 13, 2004, at 6:27 AM, Robert Colquhoun wrote:
>
> > The comments almost by definition were not lexable(unless there
> > is a way to do a 'catch all' lexer rule) therefore you couldn't feed
> > this
> > info through to the parser filter to figure out the context to
> > determine
> > whether something was a statement/label/comment.
>
>Did you try using lexer states? What made comments un-lexable?

Sorry i meant unlexable in that the comments(containing any unicode 
character except newline) had to be recognized and trapped as comments 
within the lexer itself and not delayed till later in a token filter.  ie 
the comments could not be lexed as a series ident's, literal's, operators 
etc which would be discarded as comment by a token filter.

If a lowest priority catch all rule can be successfully created then the 
above paragraph is wrong and a token filter can be used.

I used state flags to flip between the lexer recognizing labels/statements 
and expressions/idents.  This was tricky to do and error prone maybe i 
should have tried to do it as a separate lexer for each state or something.

Anyway i was trying to say the comments in the language are hard to 
recognize and in order for the lexer to recognize them it needs to know 
enough about the input to differentiate between labels and idents/number 
literals.  If the lexer was doing this work anyway for comments it might as 
well explicitly recognize the labels also.

>You
>know about the greedy option, right?

You means stuff like: "~(\r|\n)*"  ?

>There is a way to do the 'catch
>all' stuff-see the "ANTLR meets SED" article.

I tried a filter right at the start of the project but had poor results, 
this might of been because of inexperience.

>This will make a good example to reason about ANTLR 3 lexers with.

Attached below is some sample code which builds a list of remainders and 
multiplies them.  Hopefully it can be seen how hard it is to extract the 
comments in the lexer leaving everything else intact.

- Robert

PROGRAM REM
REM A small program to build a list of remainders
A = 11
B = ''
REM = 4
REM: REM = REM(A, REM); REM Get remainder
B = INSERT(B, -1; REM); REM Add to end of list
IF REM > 1 THEN GOTO REM
ELSE GOTO CRT
CRT: CRT CONVERT(@AM, "*", B):
* Now multiply together everything
REMOVE = 1
REMOVE: REM More comments
REMOVE REM FROM B SETTING C
IF C = 0 THEN GOTO 123
REMOVE = REMOVE * REM
GOTO REMOVE:
123 * Yes another comment!!!
CRT " = ":REMOVE
END 



 
Yahoo! Groups Links



 






 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list