[antlr-interest] Syntactic predicates question

Fri Feb 3 03:00:09 PST 2006

If you use lexer states (evil! augh! :-), please consider using two
lexers instead.  There is an example in the reference book that talks
about parsing Java and Javadoc; rather than one stateful lexer, it
uses two lexers.

I don't remember the details, but that's where I'd start.
- Bryan

On 2/2/06, Artem Dmytrenko <admytren at engin.umich.edu> wrote:
> Thank you all for the very valuable explanations of lexer behavior. My
> confusion came from not properly understanding this very behavior. It
> looks like the art of ANTLR is to keep complexity of parser and lexer
> balanced. I allowed my lexer to become too complicated and do a lot of
> work that really belongs to parser.
>
> Bryan the tip in your email is very useful. I'm also trying to split my
> identifiers (~90) and value types (~30) into two different lexer states to
> minimize the use of syntactic predicates. I think those two approaches
> should resolve my non-determinism problem.
>
> Thank you again.
>
> Sincerely,
> Artem Dmytrenko
>
> On Wed, 1 Feb 2006, Bryan Ewbank wrote:
>
> > Hi Artem,
> >
> > As others have said, the core problem is keywords and identifiers.
> > Look for reference to keyword and lookup table in the ANTLR manual.
> > Essentially, you first match IDENTIFIER, but then adjust the token
> > type using a look-up table or other algorithm...
> >
> > IDENTIFIER : ALPHA ( ALPHA | DIGIT )+
> >   { $setType( grind(<string>, ID) ); }
> >
> > here, the grind function will return the second arg if the first arg
> > does not match something of interest.  it will often be a simple
> > lookup table; however, it can be as complex as you desire/need.
> >
> > On 1/30/06, Artem Dmytrenko <admytren at engin.umich.edu> wrote:
> >> Another newbie question here :)
> >>
> >> I'm running into some problems while using syntactic predicates to
> >> resolve between ambiguous grammar rules. Here's a snippet from my lexer:
> >>
> >> protected ActionToken: ("Action" | 'A');
> >> protected ID: ALPHA (ALPHA | DIGIT)+;
> >>
> >> SyntacticPredicate:
> >>    (ActionToken) => (ActionToken { $setType (ActionToken); } ) |
> >>    (ID) => (ID { $setType (ID); } );
> >>
> >> The expectation is that this rule will match either "Action" or "A" and
> >> tag it as ActionToken or it will match alphanumeric string that starts
> >> with a letter and mark it as ID. However when parsing a string like
> >> "A12345" the rule returns neither to the parser. Here's an example
> >> misparsing message that my parser emits:
> >>
> >> line 1:94: expecting ID, found 'A'
> >>
> >> It appears that the match is stuck in the middle - e.g. ActionToken rule
> >> rejected the string but ID did not match it. Is that the expected
> >> behavior for syntactic predicates? Are there any workarounds for this
> >> problem?
> >>
> >> Thank you in advance for any help and/or pointers.
> >>
> >> Sincerely,
> >> Artem Dmytrenko
> >>
> >
> >
>