[antlr-interest] Antlr 3 and the newline token problem

Prashant Deva prashant.deva at gmail.com
Fri Nov 25 12:14:51 PST 2005


Me and Terence were recently having a discussion about this.
Its about how to handle newlines in antlr 3.

Now as you probably know that currently ANTLR 2 cant handle all 3 types of
newlines.
ie, if we have a rule like this-

WS : '\r' '\n' {newline();}
       | '\r'    {newline();}
       | '\n'   {newline();}
     ;

we would get a non determinism warning.

The reason this problem arises is solely because currently we have chosen to
store 'lines & columns' in tokens instead of offsets.

I mean, think about it this way, if we didnt have to put that newline()
call, we could easily write this rule as-

WS : '\r' | '\n' ;

This would handle all 3 types of newlines.

So i propose that in antlr 3 you identify the position of the tokens by
offset instead of 'line/columns'

This has the following advantages -


   1. Some people may like line nos to start from 1 while some may want
   them to start from 0, but 'offsets' are universally considered to start from
   0.
   2. We need to keep track of just 1 value ( the offset) instead of 2
   values (row/column), so lesser complexity.
   3. And of course we can handle all 3 types of 'newlines' easily :-)


On the other hand Terence suggests that call to newline() can be put inside
the CharBuffer class where it is handled automatically so people who need to
track line nos can do so easily.
This would be nice but then again it increases the complexity if we decide
to keep both offsets and row/cols.

Which approach do you think would be best?

If you guys would like we can put up a poll for this on the ANTLR Studio
forum.



--
Prashant Deva
Creator, ANTLR Studio,www.antlrstudio.com
Founder, Placid Systems, www.placidsystems.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20051126/fbe5513c/attachment.html


More information about the antlr-interest mailing list