[antlr-interest] ANTLR Lexer Contexts

Keith Whittingham kwhittingham at gmail.com
Sun Nov 25 23:19:10 PST 2007


1/ Use of fragement
Yes the use of fragement hides NORMAL_TOKEN_SET so that the lexer never  
'tries for' any of the productions directly

2/ Use of return values
I thought about that but in the language I'm trying to build there are  
many lexer contexts and each context has a significant number of tokens.  
Much better to be able to see the complete set cleanly.

There's one small but in the snippet I posted by the way:

BRACKETS_TOKEN_SET
	:	'0' | ('1'..'9')('0'..'9')*  { tokenType = POSINT; }
	...

should read

BRACKETS_TOKEN_SET
	:	('0' | ('1'..'9')('0'..'9')*)  { tokenType = POSINT; }
	...

I guess I'll need to add a push() and pop() lexer context method to set  
and recall more than just the context type. If I do I'll post how I do  
those too.

--------------------
NOTE to maintainers of ANTLR

1/ I think it would be worth considering adding a feature like this into  
ANTLR. It seems like the focus of efforts is on the parser (reflected in  
the book too!). IMHO if you can generate a clean token stream easily then  
parsing becomes significantly simpler.

2/ The generator also creates warnings about tokens that are generated in  
the tokens {...} action which is annoying. I had to hide the warnings by  
defining someing like "TOKEN_WITH_WARNING: '§HideMe7§' ;", i.e. defining  
it as something that will never be encountered in the input.

Keith

On Mon, 26 Nov 2007 00:40:41 +0100, Steve Bennett <stevagewp at gmail.com>  
wrote:

> On 11/26/07, Keith Whittingham <kwhittingham at gmail.com> wrote:
>> fragment
>> NORMAL_TOKEN_SET
>>         :       ('a'..'z'|'A'..'Z'|'.'|'_')  
>> ('0'..'9'|'A'..'Z'|'a'..'z'|'.'|'_')*
>> { tokenType = NAME; }
>>         |       '[' { tokenType = OSB; context = BRACKETS; }
>
> Thanks for posting. Is it the "fragment" here that prevents this token
> always matching ahead of the other one?
>
> Also, did you consider using return values? Might be slightly more
> elegant than the quasi-global tokenType member?
>
> Steve




More information about the antlr-interest mailing list