[antlr-interest] Lexer Predicates?

Gavin Lambert antlr at mirality.co.nz
Mon Aug 4 02:21:31 PDT 2008


At 10:28 4/08/2008, Foust wrote:
>I thought the whole point of a Domain Specific 
>Language was to make the task easy on the user – 
>not on the parser-generator. It seems that the 
>issue is that what is intuitive to a human may 
>in fact be some chimera of two or more formal 
>syntaxes. Antlr does not handle this very well, 
>forcing tokens to be interpreted the same in 
>every context. But since it allows interaction 
>with the target language, there are likely 
>several ways to solve the problem.

The lexer has to generate a single set of tokens, 
yes, but as long as you don't assign too much 
semantic meaning at the lexer level then it's usually ok :)

>I thought that the cleanest way to read in a 
>free-form config {
} block (not requiring 
>quotes, or other syntax that might, in fact be 
>intended to be part of the config setting) is to 
>treat it as a separate language. I want to keep 
>the syntax as simple as possible and have no 
>possibility of conflicting with any other part 
>of the language. So I solved this particular problem by:
>-    using parser states
>-    a predicate on the ‘config’ rule to only 
>recognize it if in the correct state
>-    Implement a simple parser for just the 
>block in question using regex in the target language:

So this is for a block in some larger file that's 
effectively an entirely different language?  In 
that case, have a look at the "island grammar" examples.

The one included in the example pack shows the 
simplest/best case (where you can detect the 
start and end of the grammar island in the 
lexer); there's also an example in the Wiki 
showing how you can handle it when things are so 
ambiguous you can't work it out until the parser stage.



More information about the antlr-interest mailing list