[antlr-interest] Context-sensitive lexing
Gavin Lambert
antlr at mirality.co.nz
Mon Nov 19 00:41:32 PST 2007
At 20:52 19/11/2007, Steve Bennett wrote:
>4) tokenize magicwords but feed them back into the general
letters
>pool whenever they're not needed: letters: ('a'..'z' | MAGIC)+;
That's usually the approach I use. Although not quite like that
:)
>I've tried 1, 3 and 4 and they all work. However, 3 and 4 have
>major impacts on how the rest of the grammar will be shaped, I
>think. Also 4 has the odd behaviour of generating nodes with
>clumps of tokens: "magicword" will get lexed as "magic" and
>"word" then parsed as MAGIC+'w'+'o'+'r'+'d'.
If you do it like you've got above, yeah. But you can still
combine sequences of characters that don't happen to be magic:
MAGIC: 'magic';
TEXT: ('a'-'z')+;
text: (TEXT | MAGIC)+;
You'll get MAGIC('magic'),TEXT('word').
I usually prefer to put literal tokens (like MAGIC) in a tokens
block, though. Makes 'em easier to find. And I think you get an
ambiguity warning if you don't (though it'll still do the right
thing).
>How does one decide what method is the best?
Personal taste, mostly :)
More information about the antlr-interest
mailing list