[antlr-interest] philosophy about translation

Andy Tripp antlr at jazillian.com
Thu Nov 2 07:35:36 PST 2006


Micheal J wrote:

>I should have been clearer: Popularity isn't a measure of greatness.
>  
>
No, but popularity is a common result of greatness. If Terence wants to 
build
"the Smalltalk80 of compiler-compilers", that's fine. But I think it's 
reasonable for
someone to request that he build "the Java of compiler-compilers".

>
>  
>
>>Then you should get out more. Talk to 10 co-workers about 
>>Java vs. C++, 
>>or go to a conference.
>>I'd say that less than 5% of those who've actually used both Java and 
>>C++ prefer C++.
>>That's from my experience of talking to perhaps a few hundred 
>>developers 
>>about it.
>>    
>>
>
>A few hundred (even a few thousand) developers doesn't equate to the "vast
>majority".
>  
>
You don't need to sample the whole population to know what they think.

Let's just agree to disagree. I think Java is generally better than C++. 
The Java designers
chose to leave out all the C++ nastyness, and I find myself several 
times more productive in Java.
My personal experience has been that most other developers prefer Java 
over C++, and I
do extrapolate that to all programmers.

>>>      
>>>
>>LL(*) is brand new to V3, so that has nothing to do with it. 
>>    
>>
>
>I disagree. V3 is the reason many ANTLR'ers aren't using some other tool.
>  
>
I know you'll disagree, but I'd venture to guess that over 95% of ANTLR 
users started using
it before V3.

>Once I learned the syntax/semanics and prevailing idioms, javacc was easy
>enough.
>  
>
I'm not happy with "easy enough" - I'd prefer "as easy as possible". 
Just my personality I guess.

>
>It isn't a circular argument. It is perfectly possible to "understand the
>value" of a feature and yet not want it. I "understand the value" of MI for
>instance and I'm not calling for [standard] Java to include it bacause it
>make makes dynamic class loading far more difficult to implement. I'd rather
>not be introduced to another slew of Java bugs.
>  
>
OK. With that definition, I'd then say that I do believe that the vast 
majority of those who
"understand the value" of C++ features do prefer Java.

>
>  
>
>>>How would you change in ANTLR to make it easier?
>>> 
>>>
>>>      
>>>
>>Short answer: hide all the details from me. Make it so that I have no 
>>idea that
>>there is code being generated to do lexing and parsing. Let 
>>me just give 
>>it a C grammar
>>and a Java grammar, and then dive in and start writing 
>>translation logic 
>>without any
>>generated code or even ASTs in sight. How to do that is left as an 
>>exercise for the reader.
>>    
>>
>
>Interesting idea. Don't know if it is possible but, interesting nonetheless.
>;-)
>  
>
The more I think about it, the easier it seems to me. I must be missing 
something.

>  
>
>>>Quite often just getting "something that works" is all that is 
>>>required. Getting the best output from a compiler requires 
>>>      
>>>
>>knowing more 
>>    
>>
>>>about what goes on under the hood.
>>> 
>>>
>>>      
>>>
>>Yea, I know. You can do a better job at garbage collection 
>>than java's 
>>gc. You can write
>>better byte code than javac because you've studied javac and bytecode.
>>
>>The Java JIT guys say the first rule of performance 
>>optimization is to 
>>STOP doing whatever
>>it is you're doing that you think is producing better 
>>bytecode. And what 
>>did Terence find
>>out about performance when he tried generating his own bytecode?
>>    
>>
>
>Regardless of what Ter experienced while generating DFAs as bytecode, what
>the Java JIT guys may have said or indeed whether I can beat javac's GC
>strategy, what I actually said above remains a fact.
>  
>
That you can write better code by knowing the details of bytecode 
generation? I doubt it.
The reason I responded the way I did is my way of saying "I seriously 
doubt it".
I guess that's a separate discussion.

>
>>>      
>>>
>>If that's your definition of "power", I don't see how it relates to 
>>anything.
>>    
>>
>
>I defined "power" in terms of performance, flexibility and expressivity
>(it's still visible above).
>
>Performance: Given equivalent programs written in Java and asm/C/C++, the
>Java version would be slower (or it is always possible to optimize the
>asm/C/C++ version so it outperforms the Java version).
>  
>
Wrong. The 1990's called and they want their "Java is interpreted" 
thinking back ;)

>Flexibility: Anything program that can be written in Java can be written in
>asm/C/C++ (although one might not want to). The reverse is not true.
>  
>
That's technically true, but I'd also add that that I think that 99.9% 
of real-world applications can
be written in Java, but perhaps 20% of the developers out there don't 
think that their app can be
written in Java. I'm just making up numbers, but you get the idea.

>Expressivity: Java is less expressive than C++ (even without macros). With
>[really!] clever use of macros, the same can be said of C and perhaps asm
>too.
>  
>
Then I'd say "expressivity" is a negative attribute: C++ with macros or 
COBOL with lots of
preprocessors are "most expressive" and also "least maintainable".

>  
>
>>>- For some problems, Java/C# is more productive than assembler, C or 
>>>C++.
>>> 
>>>      
>>>
>>I'd say "for almost all problems" but OK.
>>    
>>
>
>Depends on what sort of programming problems you have to solve. A
>Windows/Linux device driver developer wouldn't use Java for instance.
>  
>
Right.

>  
>
>>>I disagree. He is working with code generated by ANTLR. He 
>>>      
>>>
>>isn't using 
>>    
>>
>>>ANTLR.
>>> 
>>>      
>>>
>>Ah, come on. When someone is using a lexer built using ANTLR, 
>>you won't 
>>consider that to be
>>"using ANTLR?" As in "He's using ANTLR without ever seeing the input 
>>grammar". That's
>>like saying I'm not "using javac", I'm just using the 
>>bytecode that it 
>>generates.
>>    
>>
>
>Which is precisely what many ANTLR users do when they download the binary
>distribution. They aren't using javac (some probably don't even know what
>javac is). They are just "using bytecode generated by javac".
>  
>
 I consider myself to be "using javac" when doing Java development.
I think maybe you're trying to "talk past me" on purpose, but it 
worked...I don't remember or
care what the point was here ;)

>
>  
>
>>>ANTLR *is* a compiler.
>>> 
>>>
>>>      
>>>
>>Right, and as such, I believe it can do what "traditional" 
>>compilers do: 
>>hide all the underlying
>>stuff from the users.
>>    
>>
>
>It does. That's why your guy can use the code it generated without knowing
>or caring about ANTLR.
>  
>
The lexer part of ANTLR does hide the details well. It's parser part 
that doesn't do so well,
forcing me to really remember the original grammar and the shape of the 
ASTs that I'm creating.
And treewalkers have the same problem.

>  
>
>>>>Compiler designers take it as a given that users need only know the
>>>>syntax/semantics of the input
>>>>language. If Ter took it as a given that ANTLR4 users need 
>>>>only know the 
>>>>syntax/semantics
>>>>of the input language, he'd end up with a very different tool.
>>>>   
>>>>        
>>>>
>>>When using ANTLR, that is all one needs to know.
>>>
>>>      
>>>
>>No. To use ANTLR, you not only need to know the input 
>>language (say, C) 
>>syntax&symantics, you
>>also need to know:
>>* The ANTLR syntax&symantics
>>* How to hook in actions: where do they make sense? What language are 
>>they in?
>>    
>>
>
>ANTLR's input language is a customized variant of EBNF that can include
>embedded "action" code written in one of a few general programming
>languages. It is used to describe the syntactic structure of other languages
>e.g. your ANTLR grammar for the C language.
>  
>
Yes, I'm familiar with ANTLR.

>Learning where actions can be "hooked in" is part of learning about the
>syntax/semantics of ANTLR's input language.
>  
>
Not really. It requires that you know something about the code being 
generated.

>  
>
>>* You often need to know details about the code that's generated to 
>>resolve ambiguities
>>    
>>
>
>A test suite mitigates against this. I agree that approximate lookahead
>generates spurious warnings.
>  
>
A test suite is just an organized way to produce the ambiguity; it 
doesn't help you
avoid them in the first place, or even help you eliminate them.

>  
>
>>* You need to know how the grammar maps to an AST structure. It's not 
>>enough to have a mental
>>   picture of the input grammar, you need to be able to form a mental 
>>picture of the AST each time
>>   you see a chunk of code.
>>    
>>
>
>ASTs are optional. You don't use them for instance. In any case, the user
>designs an AST not ANTLR. ANTLR simply provides a language for specifying
>AST construction.
>  
>
Yea, well ASTs being optional doesn't change anything. Of course you're 
going to generate
ASTs if you're using ANTLR for language translation, the alternative is 
even worse.

>  
>
>>>A compiler designer can't determine the best code to 
>>>      
>>>
>>generate for every 
>>    
>>
>>>possible situation in advance.
>>>
>>>      
>>>
>>He doesn't need to always generate the best code. It's good 
>>enough that 
>>he just generally do
>>better than humans do.
>>    
>>
>
>For some users/projects, that is enough. Not for everyone or every project.
>  
>
There may be some projects out there that can't use a standard compiler, 
yes.
But probably 99.999% do. So I do think the days of people feeling that a 
standard compiler
is not good enough for them are over.

>  
>
>>>This feature makes the tool more useful - for
>>>those who care to acquire the knowledge required to use it 
>>>      
>>>
>>effectively. 
>>    
>>
>>>It empowers knowledgeable users to tailor the output for any given 
>>>situation.
>>>
>>>      
>>>
>>And yet, there is no equivalent in Java - no bytecode 
>>tweaking. And no 
>>one seems to mind.
>>    
>>
>
>Actually, there is. Not just with javac. Javaassist, BCEL etc do just that.
>  
>
I say "there are no pink elephants", and you find one. OK, I stand 
corrected. There are apps
for which the standard "javac" is not good enough, and there are apps 
that tweak bytecode with BCEL.

I think we're losing site of the point: Javac would not be better by 
becomeing "more powerful" and
allowing you to tweak bytecodes. ANTLR is not better than it could be 
because it's "more powerful"
and lets you tweak the underlying code it generates.

>  
>
>>And there is an equivalent in C/C++ - embedded asm code. That was 
>>popular 20 years ago,
>>but today's programmers realize that the assembler is 
>>probably better at 
>>producing good code,
>>and they don't need every last 1% of performance anyway.
>>    
>>
>
>Not all the time. When they do, it is reassuring to know that gcc/vc++ still
>support it...  ;-)
>  
>
It's also reassuring to know that those people who claim to need a 2% 
performance boost and
so use gcc instead of Java are simply wrong. For one thing, gcc is not 
one of the better compilers
out there for performance.

I recently converted a JPEG image processing app that runs on a mobile 
phone from C to Java.
I'm sure you can guess what I'm going to say about performance ;)

>
>Micheal
>
>
>-----------------------
>The best way to contact me is via the list/forum. My time is very limited.
>
>
>  
>



More information about the antlr-interest mailing list