[antlr-interest] Non-disjoint tokens
Steve Bennett
stevagewp at gmail.com
Sun Nov 25 04:47:01 PST 2007
On 11/25/07, Harald Mueller <harald_m_mueller at gmx.de> wrote:
> What does help are semantic predicates (essentially, "arbitrary conditions"):
>
> HTML: {input.LA(1)=='<' &&
> input.LA(2)=='H' &&
> input.LA(3)=='T' &&
> input.LA(4)=='M' &&
> input.LA(5)=='L' &&
> input.LA(6)=='>'
> }? => '<HTML>';
> LT: '<';
>
> If there is any other way to do this, I'd also like to know it!!
Oh, thanks I'll try that. In the meantime I discovered that this semi-works:
LT: '<';
GT: '>';
HTML: LT 'HTML' GT;
NONTAG: LT LETTERS;
Lexes as follows:
<HTML> matches HTML
<HTML FOO matches NONTAG, SPACE, LETTERS
<FOO matches NONTAG
< matches LT
It's not great, because I'm stuck with one token (<FOO rather than
two, <, FOO), but it's better than nothing. Maybe with an 'emit()' I
can get an extra token? Though I suspect not...
Steve
More information about the antlr-interest
mailing list