[antlr-interest] Q re use of Semantic Predicates

Thu Jun 9 10:26:35 PDT 2005

>>>>> "Gerald" == Gerald B Rosenberg <gbr at newtechlaw.com> writes:
[...]

>> Is there some reason why you can just have two different rules?  I.e.,
>> WORD and TAGWORD, or somesuch?

> The problem is that the lexer gets confused as to whether a string of
> characters is a WORD or a TAGWORD; there are character streams that
> validly fit both definitions.  WORD is not, however, a true superset of
> TAGWORD.  The result is that the parser gets both WORD and TAGWORD
> tokens.  Accepting both in the parser as alternatives is not correct.

Ah, I missed the fact that it's not a super-/sub-set relationship.

Okay, so, it looks like you need a bit more complexity that I'd originaly
thought.

In the lexer, use the approach exemplified by the dealing of "." in e.g.,
the Java lexer.  I.e., figure out the three different forms that a "word"
can take: only word, only xmlword, or could be either.  Set the type
appropriately to one of those three (or more if there are more forms).

Then, in the parser, create two rules.  One that is the conjunction of
e.g., 'only word' | 'could be either' and the other that is 'only xmlword'
| 'could be either'.

Hope this helps,
		John