[antlr-interest] Global scopes for lexers

Benji Smith benji at benjismith.net
Wed Aug 1 12:15:03 PDT 2007


If I understand the problem correctly, then I think I have another
iteresting example: parsing regular expressions.

Normally, curly braces are used for quantifiers, like this:

  a{2, 3}  # Means "the char 'a' at least twice, but no more than three times"

And to specify a literal curly brace, it usually has to be backslash
escaped, like this:

  \{{3}  # Means "the char '{' exactly three times"

But within square brackets, the lexing rules change, and a backslash
is no longer required to specify a literal curly brace

  [{}]{3} # Means "any of the characters '{' or '}', exactly three times"

When I've implemented a regex parser in JavaCC, it was easy to create
a stack of lexical scopes, where curly brace characters as different
types of tokens depending on the lexical scope. Within a character
class, a curly brace is just a regular character literal, but in any
other lexical scope, it's a START_QUANTIFIER token.

I'm not sure of the best mechanism for handling those kinds of cases
in ANTLR, but it seems like the same kind of problem as Alex's example
with semantically-significant whitespace in some lexical scopes, but
not in others.

Hope that info helps :)

--benji


More information about the antlr-interest mailing list