[antlr-interest] Syntactic predicates question

Artem Dmytrenko admytren at engin.umich.edu
Thu Feb 2 09:54:00 PST 2006


Thank you all for the very valuable explanations of lexer behavior. My 
confusion came from not properly understanding this very behavior. It 
looks like the art of ANTLR is to keep complexity of parser and lexer 
balanced. I allowed my lexer to become too complicated and do a lot of 
work that really belongs to parser.

Bryan the tip in your email is very useful. I'm also trying to split my 
identifiers (~90) and value types (~30) into two different lexer states to 
minimize the use of syntactic predicates. I think those two approaches 
should resolve my non-determinism problem.

Thank you again.

Sincerely,
Artem Dmytrenko

On Wed, 1 Feb 2006, Bryan Ewbank wrote:

> Hi Artem,
>
> As others have said, the core problem is keywords and identifiers.
> Look for reference to keyword and lookup table in the ANTLR manual.
> Essentially, you first match IDENTIFIER, but then adjust the token
> type using a look-up table or other algorithm...
>
> IDENTIFIER : ALPHA ( ALPHA | DIGIT )+
>   { $setType( grind(<string>, ID) ); }
>
> here, the grind function will return the second arg if the first arg
> does not match something of interest.  it will often be a simple
> lookup table; however, it can be as complex as you desire/need.
>
> On 1/30/06, Artem Dmytrenko <admytren at engin.umich.edu> wrote:
>> Another newbie question here :)
>>
>> I'm running into some problems while using syntactic predicates to
>> resolve between ambiguous grammar rules. Here's a snippet from my lexer:
>>
>> protected ActionToken: ("Action" | 'A');
>> protected ID: ALPHA (ALPHA | DIGIT)+;
>>
>> SyntacticPredicate:
>>    (ActionToken) => (ActionToken { $setType (ActionToken); } ) |
>>    (ID) => (ID { $setType (ID); } );
>>
>> The expectation is that this rule will match either "Action" or "A" and
>> tag it as ActionToken or it will match alphanumeric string that starts
>> with a letter and mark it as ID. However when parsing a string like
>> "A12345" the rule returns neither to the parser. Here's an example
>> misparsing message that my parser emits:
>>
>> line 1:94: expecting ID, found 'A'
>>
>> It appears that the match is stuck in the middle - e.g. ActionToken rule
>> rejected the string but ID did not match it. Is that the expected
>> behavior for syntactic predicates? Are there any workarounds for this
>> problem?
>>
>> Thank you in advance for any help and/or pointers.
>>
>> Sincerely,
>> Artem Dmytrenko
>>
>
>


More information about the antlr-interest mailing list