[antlr-interest] MissingTokenException and skip tokens

Tobias Wunner tobias.wunner at gmail.com
Wed Apr 29 03:47:31 PDT 2009


Hello,

I tried to generate some rules which match several numbers in a text  
(i.e. several numbers in specific format within arbitrary token  
sequences). My number rules work when assuming one number per line and  
matching them with:

    file:  ('\n' number)*

When changing the newline to ".*" the numbers are not matched  
correctly anymore. I tracked down the problem to a very simple ruleset  
which can match things like

"one"
"two"
"oneandone"
"oneandthree"
"oneandoneplusoneandthree"
"oneandoneplustwo"

with "and" and "plus" acting as number connectors. The simple rule set  
is

grammar simpleNumbers;

in	:	(.* numB)*;

numB	:	numA 'plus' numA | numA 'plus' | 'plus' numA | numA;

numA	:	num 'and' num | num;

num	:	'one' | 'two' | 'three';

I assumed when having something like:

       numA someTokens numA

this would match 2 times the last OR of rule numB. But in some cases  
it matches the first OR of numB and returns a MissingTokenException as  
in following examples.

(1)     twoandone xx one

matches

-------------- next part --------------
A non-text attachment was scrubbed...
Name: parse_1.jpg
Type: image/jpeg
Size: 16428 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20090429/f57ed694/attachment.jpg 
-------------- next part --------------



         numB( numA(num("two"),"and",num("one")),  
MissingTokenException,  numA(num("one"))  )

where I would have expected to match two times the last OR of numA as

         numB(numA(num("two"),"and",num("one"))) and  
numB(numA(num("one"))).

(2)  plus xx one

matches
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parse_2.jpg
Type: image/jpeg
Size: 10551 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20090429/f57ed694/attachment-0001.jpg 
-------------- next part --------------



where I would have expected

        numA(num("one"))

only and skip "plus".

For any ideas of a better solution to skip non-valid number tokens I  
would be grateful.

Regards,
Toby




More information about the antlr-interest mailing list