[antlr-interest] One weird, one (hopefully) simple problem .. . from a newbie

mzukowski at yci.com mzukowski at yci.com
Tue Jan 28 08:22:41 PST 2003


I think those two problems could be solved with a filter between the lexer
and parser.  Consider this approach:

buffer a whole line
upon recognizing a newline, work backwards to see if there was a comment at
the end of the line by looking for '*', '!', or "REM".  If you see a ';'
then stop looking for a comment
discard the comment

now do a similar thing for a label following a ';'

What do you think?

Monty

-----Original Message-----
From: Robert Colquhoun [mailto:rjc at trump.net.au]
Sent: Tuesday, January 28, 2003 3:06 AM
To: antlr-interest at yahoogroups.com
Subject: RE: [antlr-interest] One weird, one (hopefully) simple problem
.. . from a newbie


Hello Monty,

At 08:41 AM 27/01/2003 -0800, mzukowski at yci.com wrote:
>There are other ways around these ambiguities.  See for instance
>www.codetransform.com/filterexample.html. Also if you are doing an AREV or
>BASIC parser you might be interested in some of the other pages on my site
>that tell how I did an AREV to VB translator.

Yes i also tried to figure out a way to do this in the parser but did not 
quite get as far as you have.  Firstly the technique seemed to produce alot 
of ambiguity warnings which i could have silenced but was worried that i 
might miss something important by doing so.  This was only cosmetic though, 
the main problem  that i just could not solve using the above method was 
with the way comments worked:

1) Comments can begin at start of line, after a label or after a semi colon
2) Semi colons can also be used in argument lists, to shortcut optional
params
3) Comments are started by '*' or '!' or 'REM'

I noticed in your solution you removed comments in the lexer, problems come 
from the above rules trying to parse something like:

A = INSERT(B, C ; REM) ; REM A comment

...the first REM is a variable the second REM is the start of a 
comment.  The problem was the lexer didn't know enough to distinguish 
between the 2 cases.  I considered letting comments flow through to the 
parser to solve this but all sorts of junk that wont even lex gets put into 
them.

Similar example with '*' comment char:
A = INSERT(B, C; 1 * 2); 1 * 2

The first '1 * 2' is a straight multiply, the second is a label '1' 
followed by a comment 2.

In the end i was forced into doing the work in the lexer, maintaining 
states, which was ugly but which gave me a relatively clean parser grammer.

PS I just checked the above 2 cases with the current grammer, and they 
don't quite work...need to add something to track bracket depth in the 
lexer aaarrghh!

  - Robert


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list