[antlr-interest] catching the rest

Tue Apr 20 13:05:51 PDT 2004

While trying out filters I faced a "must-be-simple" problem:

How would you write the grammer for the input

aabaacaaaaca
    --   --

It should find the 'ac's and keep the rest together in REST tokens?

The example should result in the tokensequence

REST AC REST AC REST

where AC consumes the string "ac"
and REST consumes
1. "aaba" (b can be any character accept c)
2. "aaa"
3. "a"

My solution leads to nondeterminism:
-------
class T5aParser extends Parser;
all: ( AC | REST )*;

class T5aLexer extends Lexer;
options { k = 2;
    charVocabulary = 
'\u0009'|'\u000a'|'\u0020'..'\u007e'|'\u00a0'..'\u00ff'; }
AC: ( 'a' 'c' );
REST: ( ~'a' | {LA(2)!='c'}? 'a' )+;
-------

I'm sure you'll see at a glance how to get back to the right track: 
determinism.
Please let me know.

Rolf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 1730 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20040420/a3ed45e2/smime.bin