[antlr-interest] A very basic grammar--and I'm confused!

Mon Aug 18 09:00:43 PDT 2008

Gavin Lambert <antlr at ...> writes:

> This is why it's dangerous to use literals in parser rules :)

I like flirting with danger.  :-)

[snip helpful advice about implied lexer rules]

> The final piece of the puzzle is that given a choice between two 
> tokens at lexing time, ANTLR will favour the longest match -- and 
> once "inside" a token, it will not consider alternative 
> interpretations.

Ah!  This is the piece I was missing.  Also, I was confused between the roles
the lexer and the parser were playing when disambiguating their respective rules.

> Hopefully this all makes sense now :)

Yes, thanks.

> you may need to 
> merge the lexer rules and give it some explicit disambiguation

Care to give any hints on how to do that?  I got a private e-mail showing an
example of using semantic predicates but I haven't played with them much.

> or possibly just add a whitespace rule, if the 'X' is actually 
> representing a keyword that must be surrounded by whitespace.

I can't do that: I'm parsing a defacto standardish file format.  (This is a
data-interchange file format, not a general-purpose computer language.  I'm
evaluating if it's worth introducing Antlr into our mix for parsing it.)

Generally speaking, this file format has a syntax that looks something like this
(represented as pseudo-antlr):

r: INT {two digits}? (INT {three digits}? VALUE END_OF_FIELD)+ NEWLINE;
END_OF_FIELD: ';';
INT: '0'..'9';
VALUE: 'A'..'Z' | 'a'..'z' | '0'..'9' | ' ';

How do I express the length requirements to the lexer/parser?  As you pointed
out, since the rule for VALUE is a superset of the rule for INT, it's sucking up
the largest text fragment.

Thanks,
Rich

(Posting through gmane and got a captcha of "stupefied".  How'd it know?)