[antlr-interest] [v3] not including text in token. Still possible?

Mon Feb 6 15:11:35 PST 2006

On 6. Feb 2006, at 19:39 Uhr, Terence Parr wrote:

> All of the grammars should be already to go. I think the code  
> generation templates are the only problem.   you need to modify the  
> code generator grammar So that it pays attention of the !  for  
> lexers.  I have already anticipated this problem and added a text  
> pointer in the common token.  The getText  method returns the text  
> pointer if not null else it looks for the indices into the text  
> buffer.

Yeah, that's cool. The ast_suffix ruleref seems to be present in all  
relevant places.

> Any rule that has a bang modifier must Create a local string to  
> fill. Code must be inserted after the match routines to add the  
> matched char to the local char buffer.  I suppose that the emit  
> method must be altered to accept a string argument representing the  
> text for the token.

The change to emit should be easy, but I have one conceptual problem  
with the bang modifier and the placement of the code after the match  
code:

Suppose I have the following lexer rule:

FOO	:	'"'! ID '"'!	;

with ID being some canonical ID-rule.

I'm seeing the ast_suffix.type==ANTLRParser.BANG in the rule 'atom'  
of codegen.g. But I also need to add code to the FOO lexer rule to  
set up
the local string buffer to collect the chars from the ID subrule.  
Simply adding code to the charRef ST (or rather using the new  
charRefBang) after
the match won't do it.
Now if I'd written "FOO! : ... " I'd be all set, but if the bang  
occurs in of the atoms inside the rule I have to look inside first  
and instantiate
the appropriate ST for this. The one with the string buffer setup code.

Any ideas?

- k