[antlr-interest] Lexer Predicates?
Gavin Lambert
antlr at mirality.co.nz
Mon Aug 4 02:21:31 PDT 2008
At 10:28 4/08/2008, Foust wrote:
>I thought the whole point of a Domain Specific
>Language was to make the task easy on the user
>not on the parser-generator. It seems that the
>issue is that what is intuitive to a human may
>in fact be some chimera of two or more formal
>syntaxes. Antlr does not handle this very well,
>forcing tokens to be interpreted the same in
>every context. But since it allows interaction
>with the target language, there are likely
>several ways to solve the problem.
The lexer has to generate a single set of tokens,
yes, but as long as you don't assign too much
semantic meaning at the lexer level then it's usually ok :)
>I thought that the cleanest way to read in a
>free-form config {
} block (not requiring
>quotes, or other syntax that might, in fact be
>intended to be part of the config setting) is to
>treat it as a separate language. I want to keep
>the syntax as simple as possible and have no
>possibility of conflicting with any other part
>of the language. So I solved this particular problem by:
>- using parser states
>- a predicate on the config rule to only
>recognize it if in the correct state
>- Implement a simple parser for just the
>block in question using regex in the target language:
So this is for a block in some larger file that's
effectively an entirely different language? In
that case, have a look at the "island grammar" examples.
The one included in the example pack shows the
simplest/best case (where you can detect the
start and end of the grammar island in the
lexer); there's also an example in the Wiki
showing how you can handle it when things are so
ambiguous you can't work it out until the parser stage.
More information about the antlr-interest
mailing list