[antlr-interest] Non-disjoint tokens

Steve Bennett stevagewp at gmail.com
Sun Dec 2 20:24:28 PST 2007


On 11/26/07, Gavin Lambert <antlr at mirality.co.nz> wrote:
> The usual trick with common-prefix literals (or perhaps the
> "other" usual trick, since Austin already posted the semantic
> predicate version) is to compose them into a single rule.  The key
> point is to explicitly give ANTLR the alternatives so that it
> doesn't try to plunge ahead without looking first.

Hey, I just gave this a go on a similar problem and it works really well!

In this case, I want to recognise ISBN's in normal text and treat them
specially. However, if the ISBN is malformed (even slightly), I want
to treat it like any other sequence of random letters and numbers.
This solution is elegant enough for me:

----
ISBN_LINK:
  ((ISBN_LINK_ACTUAL (LETTERS | PUNCTUATION | N)) => ISBN_LINK_ACTUAL
  | LETTERS { $type=LETTERS; }
  );

fragment
ISBN_LINK_ACTUAL:
    'ISBN'
    ' '+
    ('97' ('8' | '9'))?
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9')
    ((' ' | '-')? '0'..'9' | 'X' | 'x');


LETTERS: ('a'..'z' | 'A'..'Z')+;
DIGITS: ('0'..'9')+;
PUNCTUATION: '-' | ' ' | '.' | ',';
N: '\r'? '\n';

----

And as an added bonus, it becomes possible to add trailing
requirements (ie, the ISBN must be followed by non-digits) because you
already have the syntactic predicate.

This has made my day :)

QUESTION: Why doesn't putting ~DIGITS in the syntactic predicate work?

Steve


More information about the antlr-interest mailing list