[antlr-interest] Re: antlr-interest Digest, Vol 11, Issue 48
Desai Nishitkumar Ashokkumar
nadesai at cse.iitb.ac.in
Thu Oct 27 03:47:13 PDT 2005
On Thu, 27 Oct 2005 antlr-interest-request at antlr.org wrote:
> Send antlr-interest mailing list submissions to
> antlr-interest at antlr.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.antlr.org/mailman/listinfo/antlr-interest
> or, via email, send a message with subject or body 'help' to
> antlr-interest-request at antlr.org
>
> You can reach the person managing the list at
> antlr-interest-owner at antlr.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of antlr-interest digest..."
>
>
> Today's Topics:
>
> 1. RE: ASTPair handling in C# runtime for 2.7.6 (Luis Leal)
> 2. Re: Checking for expression end in Javascript parser (Tech)
> 3. Re: Re: Checking for expression end in Javascript parser (Tech)
> 4. Re: ASTPair handling in C# runtime for 2.7.6 (Martin Probst)
> 5. RE: ASTPair handling in C# runtime for 2.7.6 (Micheal J)
> 6. RE: ASTPair handling in C# runtime for 2.7.6 (Micheal J)
> 7. Changes for stream offset determination (Jim Crafton)
> 8. Re: Re: thank you sir (Sebastian Kaliszewski)
> 9. Re: Re: thank you sir (Paul Johnson)
> 10. Re: Help to make an iteration (somehing go wrong with
> previous) (Bryan Ewbank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 Oct 2005 22:18:06 +0200
> From: "Luis Leal" <luisl at scarab.co.za>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: "Micheal J" <open.zone at virgin.net>, "''antlr-interest' Interest'"
> <antlr-interest at antlr.org>
> Message-ID: <MPEMLILMNEOKCPPBGEMKAEJFCIAA.luisl at scarab.co.za>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I vote for option 3 as it will have the least impact on client code and will
> probably be reasonably efficient given the generational garbage collector. I
> think it's probably better to spend effort on the run-time design for Antlr
> v3 - which I'm very keen to help with BTW. Has work started on it? Please
> let me know how I can help.
>
> There is also a bug in the _saveIndex optimization which causes compile
> errors in generated lexers. I'd be happy to contribute a patch if this
> hasn't already been fixed.
>
> Regards
>
> Luis
>
> -----Original Message-----
> From: Micheal J [mailto:open.zone at virgin.net]
> Sent: 26 October 2005 08:28 PM
> To: ''antlr-interest' Interest'
> Subject: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
>
>
> Hi,
>
> For background info, pls see this thread:
>
> http://www.antlr.org/pipermail/antlr-interest/2005-April/011838.html
>
>
> The options currently being considered are (in order of attractiveness):
>
> 1. Rewrite ASTPair as a struct.
> Pros: no heap allocations, no GC impact/churn
> Cons: breaking changes to ASTFactory methods (and client code):
> - makeASTRoot() and
> - addASTChild()
>
> 2. Provide a per-instance object pool
> Pros: fixes multi-threading issue, reduces ASTPair allocations
> Cons: future GCs may be more efficient, GC impact
>
> 3. Return to the pre-2.7.5 scheme and allow the GC to clean up.
> Pros: fixes multi-threading issue, leverages GC improvements
> Cons: more ASTPair allocations, GC impact
>
> Any opinions?.
>
> Cheers,
>
> Micheal
> ANTLR/C#
>
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 26 Oct 2005 21:29:43 +0100
> From: Tech <tech at swingkid.fsnet.co.uk>
> Subject: Re: [antlr-interest] Checking for expression end in
> Javascript parser
> To: shmuel siegel <antlr at shmuelhome.mine.nu>
> Cc: antlr-interest at antlr.org
> Message-ID: <435FE737.8090007 at swingkid.fsnet.co.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Absolutely, poor wording on my part. I guess I should have said
> 'Javascript statements end at either a semi colon, or the earliest new
> line that makes a valid statement.'
>
> Given this, should
>
> a=3
>
> count as a complete statement on it's own line? My parser won't treat is
> as such because the expressions are nested, so I hope not!
>
> Mark
>
> shmuel siegel wrote:
>
>> Tech wrote:
>>
>>> One aspect that is different is that Javascript expressions end
>>> either at a semi colon, or at the earliest new line that makes a
>>> valid expression.
>>
>>
>> As far as I know, this is not a valid definition for javascript
>> statements. It is true for control statements like "return" or "break"
>> but not for arithmetic statements.
>>
>> Consider,
>> <script>
>> a=3
>> +4
>> alert(a);
>> </script>
>>
>> It is legal and will result in an alert with the value 7. In general,
>> a new line only marks the end of a statement if the next token cannot
>> be part of the previous pattern.
>>
>>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 26 Oct 2005 21:38:00 +0100
> From: Tech <tech at swingkid.fsnet.co.uk>
> Subject: Re: [antlr-interest] Re: Checking for expression end in
> Javascript parser
> To: Terence Parr <parrt at cs.usfca.edu>
> Cc: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <435FE928.20406 at swingkid.fsnet.co.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Terence,
>
> I wanted to say 'if the next character is on the same line it has to be
> a semi colon, otherwise the semi colon is optional.' I'm not sure if I
> need the 'semi colon is optional' bit, because we could always treat it
> as a separate (empty) statement, but am I right in thinking I have to
> put something as an alternative for the semantic predicate to make sense?
>
> Mark
>
> Terence Parr wrote:
>
>>
>> On Oct 26, 2005, at 1:33 AM, Tech wrote:
>>
>>> I have overriden consume() in my parser to store the line number of
>>> the last token consumed:
>>>
>>> int currentLine = 0;
>>>
>>> public override void consume()
>>> {
>>> currentLine = LA(1);
>>> base.consume();
>>> }
>>>
>>> I have then defined semi:
>>>
>>> semi
>>> : {currentLine == LT(1).getLine()}? SEMI
>>> | (SEMI)?
>>> ;
>>>
>>> This seems to work, but gives me lots of disambiguation warnings.
>>> What do you think?
>>
>>
>> Hi Mark,
>>
>> remove the (...)? around the SEMI as semi is optional...probably not
>> waht you want.
>> Ter
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 26 Oct 2005 23:54:48 +0200
> From: Martin Probst <mail at martin-probst.com>
> Subject: Re: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: antlr-interest at antlr.org
> Message-ID: <1130363688.9728.3.camel at localhost.localdomain>
> Content-Type: text/plain
>
> Hi,
>
>> For background info, pls see this thread:
>>
>> http://www.antlr.org/pipermail/antlr-interest/2005-April/011838.html
>
> Has anyone actually tested how big the performance impact is? This is of
> course application dependant, but in my experience doing such
> optimisations without a good benchmark is not a good idea, especially in
> managed code (in my case: Java code, but probably applies to .NET, too).
>
> Martin
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 27 Oct 2005 00:05:30 +0100
> From: "Micheal J" <open.zone at virgin.net>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: "''antlr-interest' Interest'" <antlr-interest at antlr.org>
> Message-ID: <000401c5da81$c3f225b0$6902a8c0 at hercules>
> Content-Type: text/plain; charset="us-ascii"
>
>> Hi,
>>
>> I vote for option 3 as it will have the least impact on
>> client code and will probably be reasonably efficient given
>> the generational garbage collector.
>
> Noted. I describe the reasons for moving away from option 3 in 2.7.5 in
> another message in this thread.
>
>> I think it's probably
>> better to spend effort on the run-time design for Antlr v3 -
>> which I'm very keen to help with BTW. Has work started on it?
>> Please let me know how I can help.
>
> ;-)
>
> Will do.
>
>> There is also a bug in the _saveIndex optimization which
>> causes compile errors in generated lexers. I'd be happy to
>> contribute a patch if this hasn't already been fixed.
>
> Details of the bug?.
>
>
> Cheers,
>
> Micheal
> ANTLR/C#
>
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 27 Oct 2005 00:05:30 +0100
> From: "Micheal J" <open.zone at virgin.net>
> Subject: RE: [antlr-interest] ASTPair handling in C# runtime for 2.7.6
> To: <antlr-interest at antlr.org>
> Message-ID: <000501c5da81$c4b2b690$6902a8c0 at hercules>
> Content-Type: text/plain; charset="us-ascii"
>
>> Has anyone actually tested how big the performance impact is?
>> This is of course application dependant, but in my experience
>> doing such optimisations without a good benchmark is not a
>> good idea, especially in managed code (in my case: Java code,
>> but probably applies to .NET, too).
>
> Yes. I did with a couple of in-production grammars with input files
> typically over 3MB in size. Hundreds of thousands of the damned things were
> alloc'ed even though the longest only call-chain was only a few thousands.
> With larger files, that alloc'ed resource and the resulting GC churn was an
> issue. That is what led to the original change to use an object pool in
> 2.7.5.
>
> Jim Crozman did as well. He reports about this in the thread.
>
> Option 1 seems to have it all IMO.
>
> Cheers,
>
> Micheal
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 26 Oct 2005 19:53:12 -0400
> From: Jim Crafton <jim.crafton at gmail.com>
> Subject: [antlr-interest] Changes for stream offset determination
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID:
> <e88138500510261653w6b016f6oc687d5bf3f89f37c at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Here are the changes I made to allow a custom AST node to determine
> the current offset of the lexer.
>
> In CharScanner.hpp, class antlr::CharScanner
>
> protected:
> unsigned int offset_;
>
> public:
> unsigned int offset() const {
> return offset_;
> }
>
>
> virtual void consume()
> {
> if (inputState->guessing == 0)
> {
> int c = LA(1);
> if (caseSensitive)
> {
> append(c);
> }
> else
> {
> // use input.LA(), not LA(), to get original case
> // CharScanner.LA() would toLower it.
> append(inputState->getInput().LA(1));
> }
> //*************************************************************
> offset_ ++;
> //*************************************************************
>
> // RK: in a sense I don't like this automatic handling.
> if (c == '\t')
> tab();
> else
> inputState->column++;
> }
> inputState->getInput().consume();
> }
>
> I increment the offset_ member in the consume() method.
>
>
> In CharScanner.cpp
>
> CharScanner::CharScanner(InputBuffer& cb, bool case_sensitive )
> : saveConsumedInput(true) //, caseSensitiveLiterals(true)
> , offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> , caseSensitive(case_sensitive)
> , literals(CharScannerLiteralsLess(this))
> , inputState(new LexerInputState(cb))
> , commitToPath(false)
> , tabsize(8)
> , traceDepth(0)
> {
> setTokenObjectFactory(&CommonToken::factory);
> }
>
> CharScanner::CharScanner(InputBuffer* cb, bool case_sensitive )
> : saveConsumedInput(true) //, caseSensitiveLiterals(true)
> , offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> , caseSensitive(case_sensitive)
> , literals(CharScannerLiteralsLess(this))
> , inputState(new LexerInputState(cb))
> , commitToPath(false)
> , tabsize(8)
> , traceDepth(0)
> {
> setTokenObjectFactory(&CommonToken::factory);
> }
>
> CharScanner::CharScanner( const LexerSharedInputState& state, bool
> case_sensitive )
> : saveConsumedInput(true) //, caseSensitiveLiterals(true)
> , offset_(0) <<<<<<<<<<<<-------I added initializer for offset to 0
> , caseSensitive(case_sensitive)
> , literals(CharScannerLiteralsLess(this))
> , inputState(state)
> , commitToPath(false)
> , tabsize(8)
> , traceDepth(0)
> {
> setTokenObjectFactory(&CommonToken::factory);
> }
>
>
> In Token.hpp, class antlr::Token
>
> public:
> virtual void setOffset( unsigned int offset ){
>
> }
>
> virtual unsigned int getOffset() const{
> return 0;
> }
>
>
> In CommonToken.hpp class antlr::CommonToken
>
> protected:
> unsigned int offset_;
>
> public:
> virtual void setOffset( unsigned int offset ) {
> offset_ = offset;
> }
>
> virtual unsigned int getOffset() const {
> return offset_;
> }
>
>
>
> In CommonToken.cpp
>
> CommonToken::CommonToken() : Token(), line(1), col(1), offset_(0), text("")
> {}
>
> CommonToken::CommonToken(int t, const ANTLR_USE_NAMESPACE(std)string& txt)
> : Token(t)
> , line(1)
> , col(1)
> ,offset_(0)
> , text(txt)
> {}
>
> CommonToken::CommonToken(const ANTLR_USE_NAMESPACE(std)string& s)
> : Token()
> , line(1)
> , col(1)
> , offset_(0)
> , text(s)
> {}
>
>
> Note that the offset_ member is initialized to 0.
>
> Then in my custom AST class I do something like this:
>
>
> class CppASTNode : public CommonAST {
> public:
>
> CppASTNode(): line_(0), column_(0), offset_(0) {}
>
>
> CppASTNode( antlr::RefToken t ): line_(0), column_(0), offset_(0) {
> CommonAST::setType(t->getType() );
> CommonAST::setText(t->getText() );
>
> line_ = t->getLine();
> column_ = t->getColumn();
> offset_ = t->getOffset() - t->getText().size();
> }
>
> void initialize(int t, const std::string& txt) {
> CommonAST::setType(t);
> CommonAST::setText(txt);
>
> line_ = 0; // to be noticed !
> column_ = 0;
> }
>
> void initialize( RefCppASTNode t ) {
> CommonAST::setType(t->getType() );
> CommonAST::setText(t->getText() );
>
> line_ = t->line_;
> column_ = t->column_;
>
> offset_ = t->offset_;
> }
>
> void initialize( RefAST t ) {
> CommonAST::initialize(t);
> }
>
> void initialize( antlr::RefToken t ) {
> CommonAST::initialize(t);
>
> line_ = t->getLine();
> column_ = t->getColumn();
> offset_ = t->getOffset() - t->getText().size();
> }
>
> void setText(const std::string& txt) {
> CommonAST::setText(txt);
> }
>
> void setType(int type) {
> CommonAST::setType(type);
> }
>
> static antlr::RefAST factory( void ) {
> antlr::RefAST ret =
> static_cast<antlr::RefAST>(RefCppASTNode(new CppASTNode()));
>
> return ret;
> }
>
> int getLineNumber() const {
> return line_;
> }
>
> int getColumnNumber() const {
> return column_;
> }
>
> unsigned int getOffset() const {
> return offset_;
> }
> protected:
> int line_;
> int column_;
> unsigned int offset_;
> };
>
>
> Cheers, and hope this proves useful to others.
>
> Jim
>
>
> ------------------------------
>
> Message: 8
> Date: Thu, 27 Oct 2005 09:32:57 +0200
> From: Sebastian Kaliszewski <Sebastian.Kaliszewski at softax.com.pl>
> Subject: Re: [antlr-interest] Re: thank you sir
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <436082A9.4050208 at softax.com.pl>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Terence Parr wrote:
>> C++ is C so I'm not sure you have a problem with anything except the io
>> libraries and such; still the Clib should still work with C++ right?
>
> Well, this is not true. There are many C constructs which are illegal in
> C++, and some constructs have different meaning (i.e. my_type my_fun() vs
> my_type my_fun(void)).
>
> rgds
> Sebastian Kaliszewski
>
>
> ------------------------------
>
> Message: 9
> Date: Thu, 27 Oct 2005 10:20:10 +0100
> From: Paul Johnson <gt54-antlr at cyconix.com>
> Subject: Re: [antlr-interest] Re: thank you sir
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID: <43609BCA.1090407 at cyconix.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sebastian Kaliszewski wrote:
>> Terence Parr wrote:
>>
>>> C++ is C so I'm not sure you have a problem with anything except the
>>> io libraries and such; still the Clib should still work with C++ right?
>>
>>
>> Well, this is not true. There are many C constructs which are illegal in
>> C++, and some constructs have different meaning (i.e. my_type my_fun()
>> vs my_type my_fun(void)).
>
> As Stroustrup says in Appx B of his book, "with minor exceptions, C++ is
> a superset of [C89] C... Well-written C programs tend to be C++ programs
> as well". Sure, you can write bad C that won't compile as C++, but you
> can't fix bad C programs automatically, which appears to be what the OP
> wanted.
>
> Anyway, it's not at all obvious what the OP wanted to do. g++ is just a
> driver for GCC: it assumes that the default language is C++, and that
> you need to link against the C++ libraries. It has no problem compiling
> C programs, even bad ones, and certainly ones which aren't valid C++.
g++ has problem in compiling many c programs. As I previously said in c
typecasting is not needed explicitly where as in c++ it is needed.
there are many other types of statments for which g++ gives error but gcc
compiles without any error. Btw, whats the OP ?
thanks for ur reply.
>
> Paul
>
>
>
> ------------------------------
>
> Message: 10
> Date: Thu, 27 Oct 2005 05:26:58 -0400
> From: Bryan Ewbank <ewbank at gmail.com>
> Subject: Re: [antlr-interest] Help to make an iteration (somehing go
> wrong with previous)
> To: ANTLR Interest <antlr-interest at antlr.org>
> Message-ID:
> <dd3a065f0510270226x3def51dco3c305831222298de at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Let me try again with more explanation...
>
> [01] eval:
> [02] ... ... ... | while_stmt | ... ... ...
> [03] ;
> [04]
> [05] while_stmt
> [06] :
> [07] #( WHILE e:expr s:stmt )
> [08] {
> [09] while (eval(#e) == true)
> [10] {
> [11] eval(#s);
> [12] }
> [13] }
> [14] ;
>
> Line [07] matches the assumed tree for a while node. When this matches, the
> action (lines [08-13]) is executed. The while-statement at [09] is executed in
> the native language, which means that the "eval(#s)" at [11] will be executed
> each time that the condition in the while at [09] evaluates to "true"
>
> Note that [07] could probably be rewritten as follows, because it's likely
> wasteful to traverse those trees to recognize them:
>
> [07] #( WHILE e:. s:. )
>
> So, yes, stmt is another tree. ANTLR allows you to walk (and capture) the expr
> and stmt trees once; after that, you must walk (and evaluate) each tree
> multiple times.
>
> Hope this is a bit more clear in what I was saying,
>
> - Bryan
>
> On 10/26/05, gil_loureiro at iol.pt <gil_loureiro at iol.pt> wrote:
>> But the problem is stmt is another tree ... how can I walk this tree
>> (with eval(#s)) to run the contained set of statements multiple times?
>
>
> ------------------------------
>
> _______________________________________________
> antlr-interest mailing list
> antlr-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/antlr-interest
>
>
> End of antlr-interest Digest, Vol 11, Issue 48
> **********************************************
>
--
Nishit Desai
M.Tech II year
Computer Science & Engg.
IIT Bombay
More information about the antlr-interest
mailing list