[antlr-interest] Lookahead question
jose.sanleandro at ventura24.es
jose.sanleandro at ventura24.es
Wed Dec 8 09:45:28 PST 2004
Hi all,
I've been trying to train myself enough so that ANTLR joins my particular
toolbox. I have to say that it's been a challenging task, and, although I've
been successful in some cases, I know I'm missing some important issues.
Recently I needed to write a "parser" able to "understand" the output format
of "rlog" (used by "cvs log"). The format is simple, whose parts are mostly
fixed. I thought it'd be good to train myself with such an easy grammar.
At the end, I've had to do some "strange" approaches to make it work as I want
it to, by understanding and debugging the generated code, and by explicitly
fixing the lookahead to 1.
The reason is that some of the grammar rules allow arbitrary texts, which
ocassionally triggered conflicts with literals. I thought I had just the
lookahead option to defeat such conflicts.
Finally, I decided to use a lookahead of one character, and explictly solve
the conflicts. That ended up in a grammar which doesn't seem so :(. Take a
look at a fragment:
STARTS_WITH_B:
'b'
(({ if ( (LA(1) == 'r')
&& (LA(2) == 'a')
&& (LA(3) == 'n')
&& (LA(4) == 'c')
&& (LA(5) == 'h')
&& (LA(6) == ':'))
{
mRESERVED_BRANCH(false);
$setType(LITERAL_BRANCH);
}
else
{
mSTRING(false);
$setType(STRING);
}
})
)
;
Basically, there's a non-protected rule for all starting letters of reserved
words of the grammar, to guide the lexer in ambiguous situations.
I tried to use syntactic predicates, but after spending some time I wasn't
able to make it generate the code I wanted, and in the same order.
I've used the lexer to just split words and distinguish them by assigning
different token identifiers. For me, it's role is similar to a specialized
SAX parser which creates ANTLR objects (tokens) and optionally custom logic,
defined in the grammar itself. If it fails, the input is not "valid".
On the other hand, the parser expects the correct tokens in the correct order,
following certain rules. It optionally creates DOM-like structures. If it
fails, the input is not "well-formed".
Finally, the tree parser just processes such object hierarchy (defined by the
parser), and provide features such as what xpath or xsl stylesheets could
perform. Is the analogy valid?
Moreover, which is the main drawback of explicitly resolving the ambiguous
situations for the lexer using inline LA(x) checks?
Thank you.
Jose.
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list