[antlr-interest] Re: antlr-interest Digest, Vol 11, Issue 48

Thu Oct 27 03:47:13 PDT 2005

On Thu, 27 Oct 2005 antlr-interest-request at antlr.org wrote:

> Send antlr-interest mailing list submissions to
> 	antlr-interest at antlr.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.antlr.org/mailman/listinfo/antlr-interest
> or, via email, send a message with subject or body 'help' to
> 	antlr-interest-request at antlr.org
>
> You can reach the person managing the list at
> 	antlr-interest-owner at antlr.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of antlr-interest digest..."
>
>
> Today's Topics:
>
>   1. RE: ASTPair handling in C# runtime for 2.7.6 (Luis Leal)
>   2. Re: Checking for expression end in Javascript parser (Tech)
>   3. Re: Re: Checking for expression end in Javascript	parser (Tech)
>   4. Re: ASTPair handling in C# runtime for 2.7.6 (Martin Probst)
>   5. RE: ASTPair handling in C# runtime for 2.7.6 (Micheal J)
>   6. RE: ASTPair handling in C# runtime for 2.7.6 (Micheal J)
>   7. Changes for stream offset determination (Jim Crafton)
>   8. Re: Re: thank you sir (Sebastian Kaliszewski)
>   9. Re: Re: thank you sir (Paul Johnson)
>  10. Re: Help to make an iteration (somehing go wrong	with
>      previous) (Bryan Ewbank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 Oct 2005 22:18:06 +0200
> From: "Luis Leal" <luisl at scarab.co.za>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: "Micheal J" <open.zone at virgin.net>,	"''antlr-interest' Interest'"
> 	<antlr-interest at antlr.org>
> Message-ID: <MPEMLILMNEOKCPPBGEMKAEJFCIAA.luisl at scarab.co.za>
> Content-Type: text/plain;	charset="us-ascii"
>
> Hi,
>
> I vote for option 3 as it will have the least impact on client code and will
> probably be reasonably efficient given the generational garbage collector. I
> think it's probably better to spend effort on the run-time design for Antlr
> v3 - which I'm very keen to help with BTW. Has work started on it? Please
> let me know how I can help.
>
> There is also a bug in the _saveIndex optimization which causes compile
> errors in generated lexers. I'd be happy to contribute a patch if this
> hasn't already been fixed.
>
> Regards
>
> Luis
>
> -----Original Message-----
> From: Micheal J [mailto:open.zone at virgin.net]
> Sent: 26 October 2005 08:28 PM
> To: ''antlr-interest' Interest'
> Subject: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
>
>
> Hi,
>
> For background info, pls see this thread:
>
> http://www.antlr.org/pipermail/antlr-interest/2005-April/011838.html
>
>
> The options currently being considered are (in order of attractiveness):
>
> 1. Rewrite ASTPair as a struct.
>   Pros: no heap allocations, no GC impact/churn
>   Cons: breaking changes to ASTFactory methods (and client code):
>         - makeASTRoot() and
>         - addASTChild()
>
> 2. Provide a per-instance object pool
>   Pros: fixes multi-threading issue, reduces ASTPair allocations
>   Cons: future GCs may be more efficient, GC impact
>
> 3. Return to the pre-2.7.5 scheme and allow the GC to clean up.
>   Pros: fixes multi-threading issue, leverages GC improvements
>   Cons: more ASTPair allocations, GC impact
>
> Any opinions?.
>
> Cheers,
>
> Micheal
> ANTLR/C#
>
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 26 Oct 2005 21:29:43 +0100
> From: Tech <tech at swingkid.fsnet.co.uk>
> Subject: Re: [antlr-interest] Checking for expression end in
> 	Javascript parser
> To: shmuel siegel <antlr at shmuelhome.mine.nu>
> Cc: antlr-interest at antlr.org
> Message-ID: <435FE737.8090007 at swingkid.fsnet.co.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Absolutely, poor wording on my part. I guess I should have said
> 'Javascript statements end at either a semi colon, or the earliest new
> line that makes a valid statement.'
>
> Given this, should
>
>    a=3
>
> count as a complete statement on it's own line? My parser won't treat is
> as such because the expressions are nested, so I hope not!
>
> Mark
>
> shmuel siegel wrote:
>
>> Tech wrote:
>>
>>> One aspect that is different is that Javascript expressions end
>>> either at a semi colon, or at the earliest new line that makes a
>>> valid expression.
>>
>>
>> As far as I know, this is not a valid definition for javascript
>> statements. It is true for control statements like "return" or "break"
>> but not for arithmetic statements.
>>
>> Consider,
>>     <script>
>>         a=3
>>         +4
>>         alert(a);
>>     </script>
>>
>> It is legal and will result in an alert with the value 7. In general,
>> a new line only marks the end of a statement if the next token cannot
>> be part of the previous pattern.
>>
>>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 26 Oct 2005 21:38:00 +0100
> From: Tech <tech at swingkid.fsnet.co.uk>
> Subject: Re: [antlr-interest] Re: Checking for expression end in
> 	Javascript	parser
> To: Terence Parr <parrt at cs.usfca.edu>
> Cc: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <435FE928.20406 at swingkid.fsnet.co.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Terence,
>
> I wanted to say 'if the next character is on the same line it has to be
> a semi colon, otherwise the semi colon is optional.' I'm not sure if I
> need the 'semi colon is optional' bit, because we could always treat it
> as a separate (empty) statement, but am I right in thinking I have to
> put something as an alternative for the semantic predicate to make sense?
>
> Mark
>
> Terence Parr wrote:
>
>>
>> On Oct 26, 2005, at 1:33 AM, Tech wrote:
>>
>>> I have overriden consume() in my parser to store the line number of
>>> the last token consumed:
>>>
>>>    int currentLine = 0;
>>>
>>>    public override void consume()
>>>        {
>>>            currentLine = LA(1);
>>>            base.consume();
>>>        }
>>>
>>> I have then defined semi:
>>>
>>>    semi
>>>        :    {currentLine == LT(1).getLine()}? SEMI
>>>        |    (SEMI)?
>>>        ;
>>>
>>> This seems to work, but gives me lots of disambiguation warnings.
>>> What do you think?
>>
>>
>> Hi Mark,
>>
>> remove the (...)? around the SEMI as semi is optional...probably not
>> waht you want.
>> Ter
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 26 Oct 2005 23:54:48 +0200
> From: Martin Probst <mail at martin-probst.com>
> Subject: Re: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: antlr-interest at antlr.org
> Message-ID: <1130363688.9728.3.camel at localhost.localdomain>
> Content-Type: text/plain
>
> Hi,
>
>> For background info, pls see this thread:
>>
>> http://www.antlr.org/pipermail/antlr-interest/2005-April/011838.html
>
> Has anyone actually tested how big the performance impact is? This is of
> course application dependant, but in my experience doing such
> optimisations without a good benchmark is not a good idea, especially in
> managed code (in my case: Java code, but probably applies to .NET, too).
>
> Martin
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 27 Oct 2005 00:05:30 +0100
> From: "Micheal J" <open.zone at virgin.net>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: "''antlr-interest' Interest'" <antlr-interest at antlr.org>
> Message-ID: <000401c5da81$c3f225b0$6902a8c0 at hercules>
> Content-Type: text/plain;	charset="us-ascii"
>
>> Hi,
>>
>> I vote for option 3 as it will have the least impact on
>> client code and will probably be reasonably efficient given
>> the generational garbage collector.
>
> Noted. I describe the reasons for moving away from option 3 in 2.7.5 in
> another message in this thread.
>
>> I think it's probably
>> better to spend effort on the run-time design for Antlr v3 -
>> which I'm very keen to help with BTW. Has work started on it?
>> Please let me know how I can help.
>
> ;-)
>
> Will do.
>
>> There is also a bug in the _saveIndex optimization which
>> causes compile errors in generated lexers. I'd be happy to
>> contribute a patch if this hasn't already been fixed.
>
> Details of the bug?.
>
>
> Cheers,
>
> Micheal
> ANTLR/C#
>
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 27 Oct 2005 00:05:30 +0100
> From: "Micheal J" <open.zone at virgin.net>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: <antlr-interest at antlr.org>
> Message-ID: <000501c5da81$c4b2b690$6902a8c0 at hercules>
> Content-Type: text/plain;	charset="us-ascii"
>
>> Has anyone actually tested how big the performance impact is?
>> This is of course application dependant, but in my experience
>> doing such optimisations without a good benchmark is not a
>> good idea, especially in managed code (in my case: Java code,
>> but probably applies to .NET, too).
>
> Yes. I did with a couple of in-production grammars with input files
> typically over 3MB in size. Hundreds of thousands of the damned things were
> alloc'ed even though the longest only call-chain was only a few thousands.
> With larger files, that alloc'ed resource and the resulting GC churn was an
> issue. That is what led to the original change to use an object pool in
> 2.7.5.
>
> Jim Crozman did as well. He reports about this in the thread.
>
> Option 1 seems to have it all IMO.
>
> Cheers,
>
> Micheal
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 26 Oct 2005 19:53:12 -0400
> From: Jim Crafton <jim.crafton at gmail.com>
> Subject: [antlr-interest] Changes for stream offset determination
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID:
> 	<e88138500510261653w6b016f6oc687d5bf3f89f37c at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Here are the changes I made to allow a custom AST node to determine
> the current offset of the lexer.
>
> In CharScanner.hpp, class antlr::CharScanner
>
> protected:
> unsigned int offset_;
>
> public:
>  unsigned int offset() const {
>    return offset_;
>  }
>
>
> 	virtual void consume()
> 	{
> 		if (inputState->guessing == 0)
> 		{
> 			int c = LA(1);
> 			if (caseSensitive)
> 			{
> 				append(c);
> 			}
> 			else
> 			{
> 				// use input.LA(), not LA(), to get original case
> 				// CharScanner.LA() would toLower it.
> 				append(inputState->getInput().LA(1));
> 			}
> //*************************************************************
> 			offset_ ++;
> //*************************************************************
>
> 			// RK: in a sense I don't like this automatic handling.
> 			if (c == '\t')
> 				tab();
> 			else
> 				inputState->column++;
> 		}
> 		inputState->getInput().consume();
> 	}
>
> I increment the offset_ member in the consume() method.
>
>
> In CharScanner.cpp
>
> CharScanner::CharScanner(InputBuffer& cb, bool case_sensitive )
> 	: saveConsumedInput(true) //, caseSensitiveLiterals(true)
> 	, offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> 	, caseSensitive(case_sensitive)
> 	, literals(CharScannerLiteralsLess(this))
> 	, inputState(new LexerInputState(cb))
> 	, commitToPath(false)
> 	, tabsize(8)
> 	, traceDepth(0)
> {
> 	setTokenObjectFactory(&CommonToken::factory);
> }
>
> CharScanner::CharScanner(InputBuffer* cb, bool case_sensitive )
> 	: saveConsumedInput(true) //, caseSensitiveLiterals(true)
> 	, offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> 	, caseSensitive(case_sensitive)
> 	, literals(CharScannerLiteralsLess(this))
> 	, inputState(new LexerInputState(cb))
> 	, commitToPath(false)
> 	, tabsize(8)
> 	, traceDepth(0)
> {
> 	setTokenObjectFactory(&CommonToken::factory);
> }
>
> CharScanner::CharScanner( const LexerSharedInputState& state, bool
> case_sensitive )
> 	: saveConsumedInput(true) //, caseSensitiveLiterals(true)
> 	, offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> 	, caseSensitive(case_sensitive)
> 	, literals(CharScannerLiteralsLess(this))
> 	, inputState(state)
> 	, commitToPath(false)
> 	, tabsize(8)
> 	, traceDepth(0)
> {
> 	setTokenObjectFactory(&CommonToken::factory);
> }
>
>
> In Token.hpp, class antlr::Token
>
> public:
> virtual void setOffset( unsigned int offset ){
>
> }
>
> virtual unsigned int getOffset() const{
>   return 0;
> }
>
>
> In CommonToken.hpp class antlr::CommonToken
>
> protected:
> unsigned int offset_;
>
> public:
> virtual void setOffset( unsigned int offset ) {
> 	offset_ = offset;
> }
>
> virtual unsigned int getOffset() const {
> 	return offset_;
> }
>
>
>
> In CommonToken.cpp
>
> CommonToken::CommonToken() : Token(), line(1), col(1), offset_(0), text("")
> {}
>
> CommonToken::CommonToken(int t, const ANTLR_USE_NAMESPACE(std)string& txt)
> : Token(t)
> , line(1)
> , col(1)
> ,offset_(0)
> , text(txt)
> {}
>
> CommonToken::CommonToken(const ANTLR_USE_NAMESPACE(std)string& s)
> : Token()
> , line(1)
> , col(1)
> , offset_(0)
> , text(s)
> {}
>
>
> Note that the offset_ member is initialized to 0.
>
> Then in my custom AST class I do something like this:
>
>
> class CppASTNode : public CommonAST {
> public:
>
> 	CppASTNode(): line_(0), column_(0), offset_(0) {}
>
>
> 	CppASTNode( antlr::RefToken t ): line_(0), column_(0), offset_(0) {
> 		CommonAST::setType(t->getType() );
> 		CommonAST::setText(t->getText() );
>
> 		line_ = t->getLine();
> 		column_ = t->getColumn();
> 		offset_ = t->getOffset() - t->getText().size();
> 	}
>
> 	void initialize(int t, const std::string& txt) {
> 		CommonAST::setType(t);
> 		CommonAST::setText(txt);
>
> 		line_ = 0; // to be noticed !
> 		column_ = 0;
>    }
>
> 	void initialize( RefCppASTNode t ) {
>        CommonAST::setType(t->getType() );
> 		CommonAST::setText(t->getText() );
>
> 		line_ = t->line_;
> 		column_ = t->column_;
>
> 		offset_ = t->offset_;
>    }
>
> 	void initialize( RefAST t ) {
>        CommonAST::initialize(t);
>    }
>
> 	void initialize( antlr::RefToken t )  {
>        CommonAST::initialize(t);
>
> 		line_ = t->getLine();
> 		column_ = t->getColumn();
> 		offset_ = t->getOffset() - t->getText().size();
>    }
>
> 	void setText(const std::string& txt) {
> 		CommonAST::setText(txt);
> 	}
>
> 	void setType(int type) {
> 		CommonAST::setType(type);
> 	}
>
> 	static antlr::RefAST factory( void ) {
> 		antlr::RefAST ret =
> 			static_cast<antlr::RefAST>(RefCppASTNode(new CppASTNode()));
>
> 		return ret;
> 	}
>
> 	int getLineNumber() const {
> 		return line_;
> 	}
>
> 	int getColumnNumber() const {
> 		return column_;
> 	}
>
> 	unsigned int getOffset() const {
> 		return offset_;
> 	}
> protected:
> 	int line_;
> 	int column_;
> 	unsigned int offset_;
> };
>
>
> Cheers, and hope this proves useful to others.
>
> Jim
>
>
> ------------------------------
>
> Message: 8
> Date: Thu, 27 Oct 2005 09:32:57 +0200
> From: Sebastian Kaliszewski <Sebastian.Kaliszewski at softax.com.pl>
> Subject: Re: [antlr-interest] Re: thank you sir
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <436082A9.4050208 at softax.com.pl>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Terence Parr wrote:
>> C++ is C so I'm not sure you have a problem with anything except the  io
>> libraries and such; still the Clib should still work with C++ right?
>
> Well, this is not true. There are many C constructs which are illegal in
> C++, and some constructs have different meaning (i.e. my_type my_fun() vs
> my_type my_fun(void)).
>
> rgds
> Sebastian Kaliszewski
>
>
> ------------------------------
>
> Message: 9
> Date: Thu, 27 Oct 2005 10:20:10 +0100
> From: Paul Johnson <gt54-antlr at cyconix.com>
> Subject: Re: [antlr-interest] Re: thank you sir
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <43609BCA.1090407 at cyconix.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sebastian Kaliszewski wrote:
>> Terence Parr wrote:
>>
>>> C++ is C so I'm not sure you have a problem with anything except the
>>> io libraries and such; still the Clib should still work with C++ right?
>>
>>
>> Well, this is not true. There are many C constructs which are illegal in
>> C++, and some constructs have different meaning (i.e. my_type my_fun()
>> vs my_type my_fun(void)).
>
> As Stroustrup says in Appx B of his book, "with minor exceptions, C++ is
> a superset of [C89] C... Well-written C programs tend to be C++ programs
> as well". Sure, you can write bad C that won't compile as C++, but you
> can't fix bad C programs automatically, which appears to be what the OP
> wanted.
>
> Anyway, it's not at all obvious what the OP wanted to do. g++ is just a
> driver for GCC: it assumes that the default language is C++, and that
> you need to link against the C++ libraries. It has no problem compiling
> C programs, even bad ones, and certainly ones which aren't valid C++.

  g++ has problem in compiling many c programs. As I previously said in c 
typecasting is not needed explicitly where as in c++ it is needed. 
there are many other types of statments for which g++ gives error but gcc 
compiles without any error. Btw, whats the OP ?

  thanks for ur reply.
>
> Paul
>
>
>
> ------------------------------
>
> Message: 10
> Date: Thu, 27 Oct 2005 05:26:58 -0400
> From: Bryan Ewbank <ewbank at gmail.com>
> Subject: Re: [antlr-interest] Help to make an iteration (somehing go
> 	wrong	with previous)
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID:
> 	<dd3a065f0510270226x3def51dco3c305831222298de at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Let me try again with more explanation...
>
> [01] eval:
> [02]  ... ... ... | while_stmt | ... ... ...
> [03] ;
> [04]
> [05] while_stmt
> [06] :
> [07] 	#( WHILE e:expr s:stmt )
> [08] 	{
> [09] 		while (eval(#e) == true)
> [10] 		{
> [11] 			eval(#s);
> [12] 		}
> [13] 	}
> [14] ;
>
> Line [07] matches the assumed tree for a while node.  When this matches, the
> action (lines [08-13]) is executed.  The while-statement at [09] is executed in
> the native language, which means that the "eval(#s)" at [11] will be executed
> each time that the condition in the while at [09] evaluates to "true"
>
> Note that [07] could probably be rewritten as follows, because it's likely
> wasteful to traverse those trees to recognize them:
>
> 	[07] 	#( WHILE e:. s:. )
>
> So, yes, stmt is another tree. ANTLR allows you to walk (and capture) the expr
> and stmt trees once; after that, you must walk (and evaluate) each tree
> multiple times.
>
> Hope this is a bit more clear in what I was saying,
>
> - Bryan
>
> On 10/26/05, gil_loureiro at iol.pt <gil_loureiro at iol.pt> wrote:
>> But the problem is stmt is another tree ... how can I walk this tree
>> (with eval(#s)) to run the contained set of statements multiple times?
>
>
> ------------------------------
>
> _______________________________________________
> antlr-interest mailing list
> antlr-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/antlr-interest
>
>
> End of antlr-interest Digest, Vol 11, Issue 48
> **********************************************
>

-- 
Nishit Desai
M.Tech II year 
Computer Science & Engg.
IIT Bombay