[antlr-interest] Matching tokens only at certain places

Emond Papegaaij e.papegaaij at student.utwente.nl
Mon Jun 19 06:32:27 PDT 2006


On Monday 19 June 2006 15:03, you wrote:
> On 6/19/06, Emond Papegaaij <e.papegaaij at student.utwente.nl> wrote:
<CUT howto parse 'iface' (~';') ';'>

> > The problem is that I don't know anything about the contents of
> > METHOD_SIG_ACTION, except that it will not contain a semicolon. Creating
> > a token that matches everything except a semicolon does not work, as
> > ANTLR will always create that token for all input. I need a way to
> > specify that the the METHOD_SIG_ACTION token can only follow the 'iface'
> > token. As 'iface' is always followed by METHOD_SIG_ACTION, it is possible
> > to specify it in the lexer (ie. set some boolean to true after emitting
> > an 'iface' token).

> How did you test the statement "Creating a token that matches everything
> except a semicolon does not work"?
> Because that's what I would do, but perhaps I'm stupid.
> Have you looked at syntatic predicates?
> http://antlr.org/doc/metalang.html#SyntacticPredicates

The example grammar (in the previous mail) matches everything as a 
METHOD_SIG_ACTION. I've studied the DFA created by ANTLR, and it is clear 
that the only way to reach the IDENTIFIER token is by ending with <EOT>. 
METHOD_SIG_ACTION matches everything, including IDENTIFIERs. Therefore when 
starting to match an IDENTIFIER, it will switch to METHOD_SIG_ACTION as soon 
as it matches something that is not a letter or ';'. With the following 
input:
Printable {
  iface public String getString();
}
the tokens will be:
METHOD_SIG_ACTION: "Printable {\n\tiface public String getString()"
';'
METHOD_SIG_ACTION: "\n}\n"

and not:
IDENTIFIER: "Printable"
'{'
'iface'
METHOD_SIG_ACTION: "public String getString()"
';'
'}'

Best regards,
Emond Papegaaij


More information about the antlr-interest mailing list