[antlr-interest] simple parser lookahead problem
rjc at trump.net.au
Thu May 13 18:48:45 PDT 2004
At 02:46 AM 14/05/2004, Monty Zukowski wrote:
>On May 13, 2004, at 6:27 AM, Robert Colquhoun wrote:
> > The comments almost by definition were not lexable(unless there
> > is a way to do a 'catch all' lexer rule) therefore you couldn't feed
> > this
> > info through to the parser filter to figure out the context to
> > determine
> > whether something was a statement/label/comment.
>Did you try using lexer states? What made comments un-lexable?
Sorry i meant unlexable in that the comments(containing any unicode
character except newline) had to be recognized and trapped as comments
within the lexer itself and not delayed till later in a token filter. ie
the comments could not be lexed as a series ident's, literal's, operators
etc which would be discarded as comment by a token filter.
If a lowest priority catch all rule can be successfully created then the
above paragraph is wrong and a token filter can be used.
I used state flags to flip between the lexer recognizing labels/statements
and expressions/idents. This was tricky to do and error prone maybe i
should have tried to do it as a separate lexer for each state or something.
Anyway i was trying to say the comments in the language are hard to
recognize and in order for the lexer to recognize them it needs to know
enough about the input to differentiate between labels and idents/number
literals. If the lexer was doing this work anyway for comments it might as
well explicitly recognize the labels also.
>know about the greedy option, right?
You means stuff like: "~(\r|\n)*" ?
>There is a way to do the 'catch
>all' stuff-see the "ANTLR meets SED" article.
I tried a filter right at the start of the project but had poor results,
this might of been because of inexperience.
>This will make a good example to reason about ANTLR 3 lexers with.
Attached below is some sample code which builds a list of remainders and
multiplies them. Hopefully it can be seen how hard it is to extract the
comments in the lexer leaving everything else intact.
REM A small program to build a list of remainders
A = 11
B = ''
REM = 4
REM: REM = REM(A, REM); REM Get remainder
B = INSERT(B, -1; REM); REM Add to end of list
IF REM > 1 THEN GOTO REM
ELSE GOTO CRT
CRT: CRT CONVERT(@AM, "*", B):
* Now multiply together everything
REMOVE = 1
REMOVE: REM More comments
REMOVE REM FROM B SETTING C
IF C = 0 THEN GOTO 123
REMOVE = REMOVE * REM
123 * Yes another comment!!!
CRT " = ":REMOVE
Yahoo! Groups Links
<*> To visit your group on the web, go to:
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
More information about the antlr-interest