[antlr-interest] String lexing and partial tokens
Jim Idle
jimi at intersystems.com
Tue Nov 28 08:02:21 PST 2006
Gavin,
Fair comment I think, though I personally prefer to see the whole parameter set. I can create another macro that makes something like that a bit easier. I will add it to my list of stuff to do. The reintroduction of ! is another matter, which we have discussed quite a bit and Ter is loathe to reintroduce at this point in the game ;-)
Jim
-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gavin Lambert
Sent: Monday, November 27, 2006 11:12 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] String lexing and partial tokens
At 16:14 27/11/2006, Jim Idle wrote:
>The lexer emits a token automatically if you have not emitted
one,
>but if you use (C output) emitNew() in an action then it will
use
>this as the token. So, to exclude the start and end character:
>
>STRING: '"' (~'"')* '"'
> {
>
emitNew(type,line,charPosition,channel,start,getCharIndex()-1);
> }
The thing is that this is a lot more parameters than I really want
to deal with in a grammar. It violates my "this should be simple"
rule :)
Though I agree that having it not go allocating strings is a good
thing, so avoiding $setText seems like a good idea.
How about something more like what I ended up hacking out, with a
bit of extra support code to make it more palatable? Like so:
STRING: '"' content=UnquotedText '"' { emitPartial($content); };
fragment UnquotedText: (~'"')*;
Where 'emitPartial(x);' is the equivalent of 'emit(x);
ltoken()->setType(ltoken(), the_token_type_being_generated);'
That should be fairly simple to implement.
It'd be better still if the fragment weren't required, and you
could write something like this (this generates an AST parse error
from ANTLR at the moment):
STRING: '"' content=(~'"')* '"' { emitPartial($content); };
(maybe you'd have to have an extra set of parentheses around
there; not sure.)
And the ultimate extension would then be to reintroduce the !
operator, which automatically did the above stuff if all the non-!
components of the rule formed a contiguous block. If they're
non-contiguous, then it'd still be an error since you can't
generate a single substring from the incoming char stream that way.
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date: 11/27/2006
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date: 11/27/2006
More information about the antlr-interest
mailing list