[antlr-interest] new Java grammar AND 2.7.2a4 10-27-2002 version available

Terence Parr parrt at jguru.com
Sun Oct 27 11:22:44 PST 2002


Folks,

Anybody interested in the upgraded Java grammar (I groomed it a lot for 
jguru and folded it back into antlr stuff), can download

http://www.antlr.org/download/antlr-2.7.2a4_10-27-2002.jar

It has all my bug fixes done minus AST factory stuff.  Just waiting on 
final stuff from Brian Smith on NetBeans integration support.  Ric 
probably has some stuff too.

Will look into C# integration.

Oh, another bug: "EOF" literal is reserved...shouldn't be.

This Java grammar seems much more robust, strict, and accurate/useful 
in its tree construction.  I have run the parser on the following:

  *      Checked parser/tree parser on source for
  *          Resin-2.0.5, jive-2.1.1, jdk 1.3.1, Lucene, antlr 2.7.2a4,
  *          and the 110k jGuru server source.

That's a lot of code (it actually caught a \0 in a string in my jGuru 
code) ;)

Woohoo!

Ter
--------------[RAW changes from 2.7.1 notes 
file]---------------------------
10/27/2002

Integrated jguru fixes for java.g and java.tree.g

Fixed k=0 value causing exception.

10/25/2002

Brian Smith's fix to MismatchedCharException handles EOF properly.
I augmented to handle other special char like \n shows up as '\n' now 
not ' followed by ' on a newline.  Examples:

~/antlr/depot/code/org.antlr/main/main/examples/java/tinyc $ java Main 
< /tmp/f
exception: line 1:7: expecting '
', found <EOF>
~/antlr/depot/code/org.antlr/main/main/examples/java/tinyc $ java Main 
< /tmp/f
exception: line 1:7: expecting '\n', found <EOF>


10/18/2002

Ambig refs to ast variables caused a NullPointerException. Now it says:

class ErrorMaker extends TreeParser;

root
   : #( WHATEVER SEMI {echo(#SEMI); } SEMI {echo(#SEMI); } )
   ;

error: Ambiguous reference to AST element SEMI in rule root

Thanks to "Oleg Pavliv" <opavliv at lysis.com>

10/11/2002

Added AST.getNumberOfChildren() and to BaseAST.  !!!!Incompatible as 
you might have implemented AST!!!!!!

10/06/2002

Trying to add grammar to lout syntax diagram stuff

10/05/2002

Added antlr.util.* and .lout.* packages to do lout generation of trees. 
[not in standard distribution yet]

9/14/2002 a3

Made $FIRST/$FOLLOW work for action or exception handler.  Arg can be 
rule name

a : A {$FIRST(a); $FIRST} B
   exception
     catch [MyExc e] {
         foo = $FOLLOW(a);
         foo = $FIRST(c);
     }
;
can do $FIRST(a).member(LBRACK) etc...

Added classHeaderPrefix option for grammars.
         // get prefix (replaces "public" and lets user specify)

added setter/getter for filename in Token

rjc at trump.net.au report issues: this is an update to 2.7.2 issue.

1) This is a totally trivial thing, but when generating bitsets in the 
lexer:

    private static final long[] mk_tokenSet_1() {
          long[] data = new long[8];
          data[0]=-576460752303432712L;
          for (int i = 1; i<=3; i++) { data[i]=-1L; }
          for (int i = 4; i<=7; i++) { data[i]=0L; } // NOT DONE NOW
          return data;
      }

From: steve hurt <steven.f.hurt at lmco.com>
The second bug occurs when a user wants to
organize a suite of grammar files into seperate
directories. Due to a bug in the
tool it incorrectly forms the location of the
import/export vocabulary files. added a trim() to remove extra space.



Earlier, I added *semantic predicate hoisting (limited)* for lexer 
rules. 8/31/2002

2.7.2a3: ???

  "Silvain Piree" <s.piree at enneya.com> gave me versions of Grammar*.java 
in preproc that used stringbuffers...much faster for inherited grammars.

"Lloyd Dupont" <lloyd at galador.net>
java grammar: Was 0..9 not 0..7 in ESC when starting with 4..7
      assumed float not double; 3.0 was seen as float

  *      John Pybus                      john at pybus.org
sent in a major fix to handle f.g.super(); required rewrite of 
primary/postfix expression rules.

------
Ric's CPP since a1
* Mirrored tabsize handling from java.
* README updates
* Configure/Makefile changes from David Scott Page
   distclean targets + general cleanups etc... (see his mail)
* Removed sstream dependencies from ASTFactory
* Ported change 625 to C++ mode. (currentAST bug)
* Fixed a Makefile for sather removal.
* Fixed:In the command-line options, the docs say to use 
"-traceTreeWalker".
Alas, the code insists that you use "-traceTreeParser".  That was
annoying to figure out.  :)

2.7.2a2: 01-12-2002
-------------------------------------------------------
01-12-2002:

*** put an "if GENAST" gate around import statements for AST types
     in normal non-tree parsers.

*** optimized large bitset initialization code.  Dropped JavaLexer.class
from 87k to 25k.  Runs of integers are written via loops.

*** Thanks to Marco van Meegen <Marco.van.Meegen at t-online.de>
for his suggestion/help getting ANTLR
into shape for inclusion into eclipse IDE.  I took his suggestion and
make the antlr.Tool object a simple variable reference so that multiple
kinds of Tool objects (such as one hooked into Eclipse) could be used
with ANTLR.  This required simple changes but over *many* files!

-------------------------------------------------------
01-05-2002:

*** removed Sather support at the request of the supporter.

*** add warning/error.  Bad code gen with ^ or ! on tree root
when building trees in a tree walker grammar such as:

expr:   #(PLUS^ expr expr)
     |   i:INT
     ;

Fortunately, ^ is simply redundant; removing it makes code ok.
Added a warning.  Added an error message for ! saying that it
is not implemented.

*** bug fix: incorrect code generation for #(. BLORT) in
tree walker grammar.  Didn't properly handle the wildcard
root (missing _t==null check).

2.7.2a1: 12-28-2001
-------------------------------------------------------
12-24-01:

*** bug fix.  The lexer generator puts this assignment _after_ inserting
everything into the literals table: caseSensitiveLiterals = false;

Of course it needs to be before since ANTLRHashString depends on
it to calculate the hashCode.  Not sure when this got fixed actually.

*** Code gen bug fix: "if true {" could be generated sometimes in
the Lexer.  I put (...) around an isolated true if it's generated
from JavaCodeGenerator.getLookaheadTestExpression.

*** ANTLR used to generate a crappy error message:

warning: found optional path in nextToken()

when a public lexer rule could be optional such as

B : ('b')? ;

I now say:

warning: public lexical rule B is optional (can match "nothing")


*** For unexpected and no viable alt exceptions, no file/line/column 
info was generated.
The exception was wrapped in a TokenStreamRecognitionException that did 
not delegate error message
handling (toString()) to the wrapped exception.
I added a TokenStreamRecognitionException.toString() method so that you 
now see things like

<stdin>:1:1: unexpected char: '_'

instead of

antlr.TokenStreamRecognitionException: unexpected char: '_'

*** For large numbers of alternatives (>126) combined with syntactic 
predicates, there was a problem
whereby the syn pred testing code was not there.  2.7.1 introduced this 
problem. 2.7.2 has it right again.

*** Removed syn pred testing gates on ast construction code; returnAST 
is ignored while in
try block while guessing.  So, the tree construction in an invoked rule 
while guessing has no effect.
No need to test.

*** Char ranges with ! on the alternative or range itself did not have 
the code necessary to delete the matched character from the token text.

-------------------------------------------------------
12-22-01:

*** moved strip*(...) methods from Tool to StringUtils; updated mkjar 
accordingly.

*** added Version 2.1 of ANTLR-mode from Christoph.Wedler at sap.com

*** bug fix: a #(pippo) construct, which isn't allowed, caused a 
nullptr exception with kaffe.
It shouldn't get an exception.  It now shows: "unexpected token: pippo" 
instead.

*** a double ;; in antlr.g action and some stray semis were causing kjc 
to puke.

*** the constructors of antlr/CharQueue.java and antlr/TokenQueue.java 
didn't check
for int overflow. They try to set queue size to the next higher multiple

of 2, which is not possible for all inputs (Integer.MAX_VALUE == 
2^15-1). The

constructor loops forever for some inputs.  Checked for huge size 
requests.

*** Ric added code to not write output files if they have not changed 
(Java and C++).

*** Added default tab char handling; defaults to 8.  Added methods to 
CharScanner:
     public void setTabSize( int size );
     public int getTabSize();


*** The CharScanner.rewind(int) method did not rewind the column, just 
the input state. oops.
     It now reads:
     public void rewind(int pos) {
         inputState.input.rewind(pos);
         setColumn(inputState.tokenStartColumn); // ADDED
     }

*** reformatted all code minus Cpp*.java using Intellij IDEA.


-------------------------------------------------------
12-15-01:

*** Made doEverything return a code instead of System.exit.  made a 
wrapper
to do the exit for cmd-line tools.

*** ["Bryan O'Sullivan" <bryan at bea.com> submitted] Used to generic lots 
of static arrays, but bugs in java impl restricts number/size of static 
arrays.  Answer was to put array init into static methods and then 
return it.  Changed:

         private static final long _tokenSet_0_data_[] = { 
-2305803976550907904L, 383L, 0L, 0L };
         public static final BitSet _tokenSet_0 = new 
BitSet(_tokenSet_0_data_);

to

         private static final long[] mk_tokenSet_0() { long[] data = { 
-2305803976550907904L, 383L, 0L, 0L }; return data; }
         public static final BitSet _tokenSet_0 = new 
BitSet(mk_tokenSet_0());

Seems to slow things down a tiny bit at start up not sure if it's 
significant; -Xprof showed nothing bizarre.

---------------------------------------------------------
*** Made antlr use buffered IO for reading grammars; for java grammar 
went from 11 seconds to 6 seconds.



*** added reset() to ParserSharedInputState and Lexer too.  init of 
CharQueue
is now public.


----------------------------------------------------------------------
Ric's Stuff:
General changes:

    - Added warnings for labeled subrules.
    - New Makefile setup. Everything can be build from toplevel up.
      Changes for autoconf enabled make. Added install rules, toplevel
      configure script.
    - Renamed $lookaheadSet to $FOLLOW. Added %FOLLOW to sather mode 
(hope it
      works) original patch by Ernest Passour
	- Clarified some error/warning messages.
    - Robustified action.g - if currentRule = 0 a fitting error message 
is
      printed.
    - Ported action.g fixes from C++ to Java action.g. Warnings and 
errors
      in actions are now correctly reported in java mode as well.
    - Added reset methods to the Queue type objects (java/C++).
    - Added reset methods to the Input/TokenBuffer objects (java/C++).
    - Added reset methods to the SharedInputState objects (java/C++).
    - Added -h/-help/--help options. Adapted the year range in the 
copyright in
      the help message.
    - Allow whitespace between $setxxx and following '('.
    - Give errors when target file/directory not writeable.
    - POSSIBLE INCOMPATABILITY: Changed the position of the init actions 
for
	  (..)* (..)+ to just inside the loop handling the closure. This way we
	  can check EOF conditions in the init action for each loop 
invocation. And
	  generate much better error messages. The uponEOF stuff is too 
limited.
	  (RK: did I port this to java??)
    - Changed the charName in C++ mode and NoViableAltForCharException 
in java
      mode so that only printable characters are printed and non 
printable ones
      get 'hexdumped'.
	  (RK: maybe this should be verified for java mode.. upon 
reconsideration)

C++ related:

    - Tested with 'Sun WorkShop 6 2000/08/30 C++ 5.1 Patch 109490-01'. A 
few
      small fixes.
    - Verified build with gcc 2.8.1 and gcc 2.95.3.
    - Fixed typo in config.hpp added fixes for 2.8.1.
    - Dropped dependency on sstream from ASTFactory
    - Misc fixes for 2.8.1
    - Added config for Digital Tru64 C++ compiler. (courtesy Andre Moll)
    - MetroWerks Codewarrior fixes from Ruslan Zasukhin.
    - Define ANTLR_CCTYPE_NEEDS_STD if isprint needs std:: (RZ)
    - Define ANTLR_CXX_SUPPORTS_UNCAUGHT_EXCEPTION if 
std::uncaught_exception
      is supported by compiler. (RZ)
    - Made XML support configurable with ANTLR_SUPPORT_XML define. (RZ)
    - Moved some methods back to header for better inlining. (RZ)
	- Added getASTFactory to treeparser. Marked setASTNodeFactory as
	  deprecated, added setASTFactory to Parser (improve consistency).
    - Removed down and right initializers from BaseAST copy constructor, 
they
      wreak havoc in relation to dupTree. (forgot who reported this)
    - Added missing initializer for factory in TreeParser constructor.
    - Added the possiblity to escape # characters. Added more 
preprocessor stuff
      to be skipped. Changed error for ## into a warning.
    - Some heterogeneous AST fixes.
    - Made optimization of AST declarations constructions a little bit 
less
      aggressive.
    - Tightened up the generation of declarations for AST's.
    - Updated a lot of #include "antlr/xx" to #include <antlr/xx>. Also
    - Small addition for MSVC. (Jean-Daniel Fekete)
    - Fixed missing 0 check in astfactory code.
    - Also preprocess preheader actions and preambles for treegeneration 
code.
    - Added to the C++ LexerSharedInputState an initialize function that
      reinitializes the thing with a new stream.
    - Bugfix: Initialized attribute filename a little bit earlier so 
error
	  message shows the filename in stead of 'null'.
    - tokenNames vector is now a simple array not a vector.
    - Optimizations in Tracer classes (dumped string's). Removed 
setTokenNames
      from the support library. Switched tokenNames to use a char* array.
    - Generate NUM_TOKENS attribute in parsers. Added getNumTokens 
methods to
      parsers.
	- Changes in MismatchedTokenException to reflect the previous.
    - More fixes for XML I/O (xml-ish actually). It's a bit tidier now. 
Some
	  too advanced things removed (ios_base::failure). Embedding custom XML
	  elements in the stream should be possible now.
    - Bugfix: in case of a certain order of header actions 
(pre_include_xx etc.)
      one header action might overwrite another. Probably only affects 
C++.
    - Fix from Emir Uner for KAI C++ cast string literal to 'const
      char*' for make_pair.
    - Improved exception handling in trace routines of parser. Patch 
submitted
      by John Fremlin. Tracer class now catch exceptions from lexer. 
Fixed
      forgotten message in BitSet.cpp.
    - Added implementations for getLAChars and getMarkedChars.

Java related:

    - Reinstated fileline formatter code that got dropped in exception 
rework.
    - Added column information to FileLineFormatter. Changed stuff all 
over the
      place to support it.
    - Added ASTPair construction optimization suggested by Sander Mägi.
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list