[antlr-interest] A very basic grammar--and I'm confused!
Richard Steele
rich at steelezone.net
Mon Aug 18 09:00:43 PDT 2008
Gavin Lambert <antlr at ...> writes:
> This is why it's dangerous to use literals in parser rules :)
I like flirting with danger. :-)
[snip helpful advice about implied lexer rules]
> The final piece of the puzzle is that given a choice between two
> tokens at lexing time, ANTLR will favour the longest match -- and
> once "inside" a token, it will not consider alternative
> interpretations.
Ah! This is the piece I was missing. Also, I was confused between the roles
the lexer and the parser were playing when disambiguating their respective rules.
> Hopefully this all makes sense now :)
Yes, thanks.
> you may need to
> merge the lexer rules and give it some explicit disambiguation
Care to give any hints on how to do that? I got a private e-mail showing an
example of using semantic predicates but I haven't played with them much.
> or possibly just add a whitespace rule, if the 'X' is actually
> representing a keyword that must be surrounded by whitespace.
I can't do that: I'm parsing a defacto standardish file format. (This is a
data-interchange file format, not a general-purpose computer language. I'm
evaluating if it's worth introducing Antlr into our mix for parsing it.)
Generally speaking, this file format has a syntax that looks something like this
(represented as pseudo-antlr):
r: INT {two digits}? (INT {three digits}? VALUE END_OF_FIELD)+ NEWLINE;
END_OF_FIELD: ';';
INT: '0'..'9';
VALUE: 'A'..'Z' | 'a'..'z' | '0'..'9' | ' ';
How do I express the length requirements to the lexer/parser? As you pointed
out, since the rule for VALUE is a superset of the rule for INT, it's sucking up
the largest text fragment.
Thanks,
Rich
(Posting through gmane and got a captcha of "stupefied". How'd it know?)
More information about the antlr-interest
mailing list