[antlr-interest] "An Introduction to ANTLR" presentation slides

Mon Mar 3 08:48:28 PST 2008

>
>> The "meaning" or "semantics" for a lexer is the sequence of output 
>> tokens.
>> The "meaning" or "semantics" for a parser is the output AST.
>> The "meaning" or "semantics" for a treewalker is whatever it outputs 
>> (some modified AST or whatever).
>
> No.  Those are the output syntax forms of each (what I referred to as 
> "sentences" above) -- they do *not* represent semantics or meaning.
>
> If you take an ANTLR grammar and remove all action code from it, then 
> it will still take in input syntax and generate output syntax, but no 
> inherent meaning is associated with it.  Thus left to itself ANTLR is 
> a pure syntax recogniser/generator.  In addition to this is also 
> permits semantic validation and constructs to be included, but this is 
> convenience and is not essential to operation (except possibly for 
> some syntactically ambiguous languages).

Hmmm. I disagree, but I'm not sure what to say.
A lexer takes letters 'c', 'a', and 't' as input and outputs the word "cat".
If the word "cat" isn't the "meaning" of those letters, then I'm 
completely lost.
If you're saying that the lexer's ability to accept those letters in 
that sequence is "meaning", well, I disagree.
>
>> We NEVER see an AST being referred to as a "syntax diagram" (or 
>> "syntax" anything) - we call it an AST.
>
> Yes, and what does AST actually stand for?  Abstract Syntax Tree.  Oh 
> look, it *is* referred to as "syntax".
Yea, good point. However, it's referred to as "Abstract Syntax",
which has quite a different meaning than just "syntax":
http://en.wikipedia.org/wiki/Abstract_syntax
>
>
> Perhaps another more concrete example is in order here.  The input is:
>
>   int x = doCalculation(5);
>
> This is a character stream which the lexer might convert into the 
> token stream:
>
>   KEYWORD[int] IDENTIFIER[x] ASSIGN[=] IDENTIFIER[doCalculation] 
> OPAREN[(] NUMBER[5] CPAREN[)] SEMI[;]
>
> The parser takes that token stream and converts it into the following AST:
>
>   ( DECLARATION KEYWORD[int] IDENTIFIER[x] )
>   ( ASSIGN[=]
>     IDENTIFIER[x]
>     ( FUNCTIONCALL IDENTIFIER[doCalculation] ( NUMBER[5] ) )
>   )
>
> Everything we have done up to this point is still all just syntax.  
> This is a perfectly valid AST and thus the input is valid syntactically.
It doesn't seem odd to you that you're refering to the shape of the AST 
as "valid syntactically"?
Would you say that "x int;" is "syntactically invalid"?
I would say it's "syntactially valid", but "semantically invalid". When 
I say that, it's implied that I'm
talking about a parser, not a lexer (for which it's syntactically and 
semantically valid) or a treewalker
(for which it's a mute point, because the parser will not accept it).
>
> But what happens when we start to verify the semantics?  What happens 
> if it turns out that "doCalculation" isn't actually a function, or 
> doesn't take a single numeric parameter, or doesn't return a type that 
> can be compatibly assigned to an integer variable?  What happens if 
> "x" had already been declared as a string variable?  Those are all 
> semantic tests and they are independent of the AST itself.  So what 
> was perfectly valid syntax may be semantically incorrect.
Yup.
>
> ANTLR lets you choose whether you want to do the semantic checks right 
> at the end (in your own driver code, or in a tree walker), or whether 
> you want to do them inline at the lexing or parsing stages (either to 
> fail quickly or to resolve syntactical ambiguities).  
> But being able to insert them inline doesn't mean that they're 
> directly linked the way you seem to have been saying.  They remain 
> separate and distinct things.
I'm not sure what you're saying here. What two things are distinct?
>
> I wonder if this is a similar confusion to that caused by having a 
> combined grammar with literals in the parser rules -- it's permitted 
> for convenience but it doesn't change the fact that they're treated 
> separately (by modifying the lexer).
Again, I'm not sure what you mean. Who is confused about what?