[antlr-interest] One weird, one (hopefully) simple problem ..
. from a newbie
mzukowski at yci.com
mzukowski at yci.com
Tue Jan 28 08:22:41 PST 2003
I think those two problems could be solved with a filter between the lexer
and parser. Consider this approach:
buffer a whole line
upon recognizing a newline, work backwards to see if there was a comment at
the end of the line by looking for '*', '!', or "REM". If you see a ';'
then stop looking for a comment
discard the comment
now do a similar thing for a label following a ';'
What do you think?
Monty
-----Original Message-----
From: Robert Colquhoun [mailto:rjc at trump.net.au]
Sent: Tuesday, January 28, 2003 3:06 AM
To: antlr-interest at yahoogroups.com
Subject: RE: [antlr-interest] One weird, one (hopefully) simple problem
.. . from a newbie
Hello Monty,
At 08:41 AM 27/01/2003 -0800, mzukowski at yci.com wrote:
>There are other ways around these ambiguities. See for instance
>www.codetransform.com/filterexample.html. Also if you are doing an AREV or
>BASIC parser you might be interested in some of the other pages on my site
>that tell how I did an AREV to VB translator.
Yes i also tried to figure out a way to do this in the parser but did not
quite get as far as you have. Firstly the technique seemed to produce alot
of ambiguity warnings which i could have silenced but was worried that i
might miss something important by doing so. This was only cosmetic though,
the main problem that i just could not solve using the above method was
with the way comments worked:
1) Comments can begin at start of line, after a label or after a semi colon
2) Semi colons can also be used in argument lists, to shortcut optional
params
3) Comments are started by '*' or '!' or 'REM'
I noticed in your solution you removed comments in the lexer, problems come
from the above rules trying to parse something like:
A = INSERT(B, C ; REM) ; REM A comment
...the first REM is a variable the second REM is the start of a
comment. The problem was the lexer didn't know enough to distinguish
between the 2 cases. I considered letting comments flow through to the
parser to solve this but all sorts of junk that wont even lex gets put into
them.
Similar example with '*' comment char:
A = INSERT(B, C; 1 * 2); 1 * 2
The first '1 * 2' is a straight multiply, the second is a label '1'
followed by a comment 2.
In the end i was forced into doing the work in the lexer, maintaining
states, which was ugly but which gave me a relatively clean parser grammer.
PS I just checked the above 2 cases with the current grammer, and they
don't quite work...need to add something to track bracket depth in the
lexer aaarrghh!
- Robert
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list