[antlr-interest] Non-disjoint tokens

Steve Bennett stevagewp at gmail.com
Sun Nov 25 04:47:01 PST 2007


On 11/25/07, Harald Mueller <harald_m_mueller at gmx.de> wrote:
> What does help are semantic predicates (essentially, "arbitrary conditions"):
>
> HTML: {input.LA(1)=='<' &&
>        input.LA(2)=='H' &&
>        input.LA(3)=='T' &&
>        input.LA(4)=='M' &&
>        input.LA(5)=='L' &&
>        input.LA(6)=='>'
>       }? => '<HTML>';
> LT: '<';
>
> If there is any other way to do this, I'd also like to know it!!

Oh, thanks I'll try that. In the meantime I discovered that this semi-works:
LT:     '<';
GT: '>';
HTML: LT 'HTML' GT;
NONTAG: LT LETTERS;

Lexes as follows:
<HTML> matches HTML
<HTML FOO matches NONTAG, SPACE, LETTERS
<FOO matches NONTAG
< matches LT

It's not great, because I'm stuck with one token (<FOO rather than
two, <, FOO), but it's better than nothing. Maybe with an 'emit()' I
can get an extra token? Though I suspect not...

Steve


More information about the antlr-interest mailing list