[antlr-interest] "An Introduction to ANTLR" presentation slides

Andy Tripp antlr at jazillian.com
Fri Feb 29 10:25:51 PST 2008


Terence Parr wrote:
>
> On Feb 28, 2008, at 12:33 PM, Andy Tripp wrote:
>
>> Terence Parr wrote:
>>>
>>> syntax is the grammatical structure. Semantics deals with the 
>>> symbols (IDs).
>>>
>>> int x = "foo";
>>>
>>> is syntactically ok but semantically wrong.
>> That example illustrates the problem well.
>> You mean that to the *parser*, the input is invalid.
>
> Andy, you're killing me. ;) Syntax means the structure the symbols. 
>  That is what the parser does.  The lexer only creates vocab symbols 
> for the parser to apply structure to.
Syntax, as defined by dictionary.com, wikipedia, and (my understanding 
of) common usage, means not the structure of
just any symbols, but the structure of written language symbols (i.e. 
"words" or "tokens"). Thus, "syntax checking" is something
a parser does, but not something that a lexer or treewalker does.
>
>> To the *lexer*, it's symantically right.
>
> Semantics are all about the *meaning*, which is derived from 
> grammatical structure.  Forget the lexer, bro!
The "meaning" or "semantics" for a lexer is the sequence of output tokens.
The "meaning" or "semantics" for a parser is the output AST.
The "meaning" or "semantics" for a treewalker is whatever it outputs 
(some modified AST or whatever).

I can't tell from your "forget the lexer" comment if you agree with me 
or not, but I do maintain that
the lexer deals with semantics, just as a parser and treewalker do. For 
the lexer, the "meaning" is how
the input chars are mapped into an output token stream.
>
>> So I think the "syntactic"/"semantic" distinction is orthogonal to 
>> the issue
>> of whether input (to a lexer, parser, or treewalker) is valid.
>
> nope.  If you have no actions, it's all syntax. 
I agree, but I don't see how that relates.
> It's a 3-level recognition (i.e., syntax) problem.  All recognizers 
> are applying structure, but syntax shows how to make a valid sentence. 
>  You've heard of *syntax* diagrams no doubt...are you really saying 
> those are the lexer rules?
No, those are the parser rules: they say what's a valid  input token 
stream and how that input
should be  understood ("meaning"). We NEVER see a list of valid input 
chars as a "syntax diagram"
(or "syntax" anything) - it's
just a "input character set". And yet an ANTLR lexer can have "syntactic 
predicates". We NEVER see
an AST being referred to as a "syntax diagram" (or "syntax" anything) - 
we call it an AST.
It seems like we can talk about "syntax" when talking about lexers only 
because we mean that the
output of a lexer has a "syntax". But an ANTLR "syntactic predicate" in 
a lexer has nothing to do with
the lexer output. Clearer is the case of treewalkers. Treewalkers have 
nothing at all to do with the
common usage of the term "syntax".
>
>> Whether the example you gave (or any example) is syntactically or
>> semantically valid all depends on the lexer, parser, or treewalker.
>
> no, the parser will say if it's valid syntaciticly...if you have 
> actions you can do the semantics.
If only a parser can say whether something's valid syntactically, then 
why can I have a
"syntactic predicate" for a treewalker?
>
>>>> Terrence has this general mechanism that he's calling "predicates"
>>>> which checks the structure of the input. That input can be a stream
>>>> of characters (for lexer), tokens (for parser), or ASTs for 
>>>> treewalker.
>>>>
>>>> Now that I think about it, maybe a better name for "syntactic 
>>>> predicate"
>>>> would be "input pattern predicate" or something like that. The term
>>>> "syntactic", to me, is a bit misleading because it makes
>>>> me think of input characters.
>>>
>>> why?  i've never seen nor heard this way of thinking about it.
>> http://dictionary.reference.com/browse/syntax:
>> 4.Computers. the grammatical rules and structural patterns governing 
>> the ordered use of appropriate words and symbols for issuing 
>> commands, writing code, etc., in a particular software application or 
>> programming language.
>
> Yes, how did you get syntax == lexer out of that?  syntax defines set 
> of valid sentences; i.e., language.
It's probably clearer to just think of me as getting "treewalkers do not 
deal with syntax" out of that.
You can call a treewalker's matching of AST input as "syntactic" if you 
like, but I don't think anyone
else uses the term "syntax" for something like that.
>
>> http://en.wikipedia.org/wiki/Syntax:
>> ...study of the rules that govern the structure of sentences 
>> <http://en.wikipedia.org/wiki/Sentence>...
>>
>> Every time I've ever heard anyone talking about "syntax" they were 
>> talking
>> about the input string itself.
>
> Structure == syntax.
Structure == syntax *of sentences*
>
>>>> Saying "my treewalker has a
>>>> syntactic predicate, which of course checks the shape of the input
>>>> AST" seems a bit odd.
>>>
>>> Not sure why.
>> Because most people (including ANTLR users, I think) would not say that
>> a treewalker is doing any syntactic checking.
>
> sure it is: on the tree structure.
You're not making the distinction between what is actually happening 
(yes, the treewalker
is checking tree structure), vs. what terms people actually use (no, no 
one refers
to the treewalker's check of tree structure as "syntax checking").
>
>> They'd say it's checking the structure
>> of the AST.
>
> structure == syntax
Enum Recognizer {
   LEXER("character set checking"),
   PARSER("syntax checking"),
   TREEWALKER("AST validation");

  private Recognizer(String englishTermForInputChecking) {
     this.englishTermForInputChecking = englishTermForInputChecking;
  }
}
>
>>>> I may just be stuck in an old way of thinking,
>>>> but I just checked dictionary.com and wikipedia, and they're agreeing
>>>> with me :)
>>>
>>> not possible.  syntax is grammatical structure.  i wrote the sem 
>>> pred wikiped things so they must agree with me ;)
>> Looks like you wrote the syntactic predicate wikipedia entry, but the
>> semantic predicate entry doesn't exist.
>> I guess you coined the term "syntactic predicate", so you can have it 
>> mean whatever you want it to.
>> I just think your definition goes way beyond the dictionary 
>> definition and common usage of "syntax".
>
> I'm pretty sure you'll find my usage is the common one; all my papers 
> get past the reviewers at least in that area.  Seriously, this is the 
> most clear thing in my mind and everybody elses in the formal language 
> community.
Perhaps you could refer me to any paper, or email (other than this 
discussion) where
someone (besides you) refers to a treewalker's checking of an AST as 
"syntax checking" or similar.
>
>> The sentence: "Go!" could cause either valid or invalid input to 
>> either a lexer, parser, or treewalker.
>> If you want to consider each one's input to be its "syntax", then we 
>> have:
>>
>> lexer syntax is whether the chars are valid (whether an output 
>> Tokenstream can be created)
>
> it's syntax is whether it's a valid token.
Again, you're saying that "syntax" refers to the lexer output/parser 
input - that's the common usage.
And yet a "syntactic predicate" in a lexer rule does not relate to 
"syntax" - it relates to the lexer
input, not the lexer output.
>
> Ter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080229/9c53349f/attachment.html 


More information about the antlr-interest mailing list