[antlr-interest] * (zero or more) not matching greedily

Fri Apr 10 11:28:18 PDT 2009

Jim Idle wrote:
> Petteri Räty wrote:
>> The relevant stuff from a grammar:
>>
>> category:	(alphanum|'+'|'_'|DOT) ( alphanum|'+'|'_'|DOT|'-')* {
>> System.out.println($category.text); };
>>
>> alphanum:	LOWER|UPPER|DIGIT;
>>
>> DOT 	: '.';
>>
>> DIGIT	:	'0'..'9';
>> LOWER	:	'a'..'z';
>> UPPER	:	'A'..'Z';
>>
>> Why does it only take the first character for category? Isn't * supposed
>> to be greedy? I also tried adding options {greedy=true;} to the subrule
>> but it doesn't make a difference.
>>
>> betelgeuse at pena ~/python/depend $ ATOM="app-foo" make
>> java -cp <snip long cp> Main app-foo
>> a
>>
>> Regards,
> Read the 5 minute getting started stuff. YOur lexer rules for alphanum 
> are total conflict with the others, which should be fragments if really 
> want this. However, you are also confusing lexer rules with parser rules 
> I think. Your lexer rule should do the composite matching unless there 
> is some really good reason not to.
> 
> Jim
> 

The reason it's done this way is not apparent from these fragments but
elsewhere I need to separate between lower and uppercase characters so
this way seemed easiest. I have place where I need to match lower case
characters so it must come first but then it matches places where I
would want alphanums so that's why it's a parser rule. How do the lexer
rules conflict? I have read the tutorial but can't understand what you mean.

Regards,
Petteri

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20090410/769a344f/attachment.bin