[antlr-interest] Matching tokens only at certain places
Emond Papegaaij
e.papegaaij at student.utwente.nl
Mon Jun 19 06:32:27 PDT 2006
On Monday 19 June 2006 15:03, you wrote:
> On 6/19/06, Emond Papegaaij <e.papegaaij at student.utwente.nl> wrote:
<CUT howto parse 'iface' (~';') ';'>
> > The problem is that I don't know anything about the contents of
> > METHOD_SIG_ACTION, except that it will not contain a semicolon. Creating
> > a token that matches everything except a semicolon does not work, as
> > ANTLR will always create that token for all input. I need a way to
> > specify that the the METHOD_SIG_ACTION token can only follow the 'iface'
> > token. As 'iface' is always followed by METHOD_SIG_ACTION, it is possible
> > to specify it in the lexer (ie. set some boolean to true after emitting
> > an 'iface' token).
> How did you test the statement "Creating a token that matches everything
> except a semicolon does not work"?
> Because that's what I would do, but perhaps I'm stupid.
> Have you looked at syntatic predicates?
> http://antlr.org/doc/metalang.html#SyntacticPredicates
The example grammar (in the previous mail) matches everything as a
METHOD_SIG_ACTION. I've studied the DFA created by ANTLR, and it is clear
that the only way to reach the IDENTIFIER token is by ending with <EOT>.
METHOD_SIG_ACTION matches everything, including IDENTIFIERs. Therefore when
starting to match an IDENTIFIER, it will switch to METHOD_SIG_ACTION as soon
as it matches something that is not a letter or ';'. With the following
input:
Printable {
iface public String getString();
}
the tokens will be:
METHOD_SIG_ACTION: "Printable {\n\tiface public String getString()"
';'
METHOD_SIG_ACTION: "\n}\n"
and not:
IDENTIFIER: "Printable"
'{'
'iface'
METHOD_SIG_ACTION: "public String getString()"
';'
'}'
Best regards,
Emond Papegaaij
More information about the antlr-interest
mailing list