[antlr-interest] ANTLR 3 Lexical States

Fri Jan 25 08:07:47 PST 2008

Yes, that's a good idea, but that doesn't solve the problem that the 
state change must be done in the parser. So in the switch(state) 
statement the value of state is always NORMAL, because the lexing will 
be done first.
Now I'm thinking of the following possibilities:
- Harald Müller's lexing parser - as I see currently it doesn't work 
with overlapping Lexer rules, like if in the example below STRING is 
'a'..'z' and SPECIAL_STRING is '<'|'a'
- David Holroyd's lazy token stream - with which I see the problem that 
it lazily loads the tokens from the source, but not from the source, so 
I may not be able to change the token type according to lexical state
- handling all lexer-state-pushing situations as recursively embedded 
island-grammars - the problem is that these islands actually can be 
infinitely embedded in each other.
- going back to Antlr 2
- writing the lexer with JFlex

I really don't want the last 2 possibilities, so I'm very curious if 
there is some good ways for my grammar.

Bert

Jim Idle wrote:
> It will be a lot more readable, and generate better code if you do this 
> instead:
>
> 1) Create fragment rules for your tokens that have the same pattern. In 
> fact you can just use the tokens {} section to create the token types, 
> but then ANTLR will give you warnings that there is no token called XYZ 
> when you try to use this type in the lexer. As I hate warnings, I use 
> fragment tokens. You won't use them for matching (usually) so they don't 
> actually have to match the pattern they represent, but it is probably 
> good to document this if they don't!!
>
> 2) Use one pattern match for all the tokens that clash, then change the 
> type according to the context:
>
>
> fragment STRING 		: LETTERS 	;
> fragment SPECIAL_STRING : LETTERS	;
>
> STRINGS:
> 		LETTERS
>
> 			{
> 				switch (state)
> 				{
> 					case States.NORMAL:
>
> 							$type = STRING;
> 							Break;
> 				
> 					case States.SPECIAL:
> 							$type = SPECIAL_STRING;
> ...
> 				}	
> 			}
> ;
>
>
> And so on.
>
> Jim
>
>   
>> -----Original Message-----
>> From: Bertalan Fodor (LilyPondTool) [mailto:lilypondtool at organum.hu]
>> Sent: Friday, January 25, 2008 2:21 AM
>> To: Antlr Interest
>> Subject: [antlr-interest] ANTLR 3 Lexical States
>>
>> My Antlr grammar I'm migrating to Antlr 3 heavily uses lexical states,
>> that is, the Lexer has lots of semantic predicates to distinguish
>> between alternatives like this
>> STRING: {inState(States.NORMAL)}? LETTER+
>> SPECIAL_STRING: {inState(States.SPECIAL)}? LETTER+
>>
>> The states are set during the parse process, like this
>> special_handling: "\special" { setState(States.SPECIAL); }
>> SPECIAL_STRING;
>>
>> It worked perfectly well in Anltr 2. However, now I'm a bit afraid 
>>     
> that
>   
>> the Antlr 3 style lexing will make this not work.
>>
>> What do you think?
>>
>> Thank you,
>>
>> Bertalan Fodor
>>
>> --
>> LilyPondTool is the editor for LilyPond files.
>> See http://lilypondtool.organum.hu
>>
>>     
>
>
>
>   

-- 
LilyPondTool is the editor for LilyPond files.
See http://lilypondtool.organum.hu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080125/026030bb/attachment-0001.html