[antlr-interest] "An Introduction to ANTLR" presentation slides
Gavin Lambert
antlr at mirality.co.nz
Fri Feb 29 19:08:34 PST 2008
At 07:25 1/03/2008, Andy Tripp wrote:
>Syntax, as defined by dictionary.com, wikipedia, and (my
>understanding of) common usage, means not the structure of just
>any symbols, but the structure of written language symbols (i.e.
>"words" or "tokens"). Thus, "syntax checking" is something a
>parser does, but not something that a lexer or treewalker does.
Syntax is the structure of any kind of distinct symbols. In this
case the "word" is the fundamental unit and the "sentence" is the
grouping of those units:
- in a lexer, a "word" is a character and a "sentence" is a
token
- in a parser, a "word" is a token and a "sentence" is a set of
tokens or an AST
- in a tree walker, a "word" is an AST and a "sentence" is a set
of ASTs
>The "meaning" or "semantics" for a lexer is the sequence of
>output tokens.
>The "meaning" or "semantics" for a parser is the output AST.
>The "meaning" or "semantics" for a treewalker is whatever it
>outputs (some modified AST or whatever).
No. Those are the output syntax forms of each (what I referred to
as "sentences" above) -- they do *not* represent semantics or
meaning.
If you take an ANTLR grammar and remove all action code from it,
then it will still take in input syntax and generate output
syntax, but no inherent meaning is associated with it. Thus left
to itself ANTLR is a pure syntax recogniser/generator. In
addition to this is also permits semantic validation and
constructs to be included, but this is convenience and is not
essential to operation (except possibly for some syntactically
ambiguous languages).
>We NEVER see an AST being referred to as a "syntax diagram" (or
>"syntax" anything) - we call it an AST.
Yes, and what does AST actually stand for? Abstract Syntax
Tree. Oh look, it *is* referred to as "syntax".
Perhaps another more concrete example is in order here. The input
is:
int x = doCalculation(5);
This is a character stream which the lexer might convert into the
token stream:
KEYWORD[int] IDENTIFIER[x] ASSIGN[=] IDENTIFIER[doCalculation]
OPAREN[(] NUMBER[5] CPAREN[)] SEMI[;]
The parser takes that token stream and converts it into the
following AST:
( DECLARATION KEYWORD[int] IDENTIFIER[x] )
( ASSIGN[=]
IDENTIFIER[x]
( FUNCTIONCALL IDENTIFIER[doCalculation] ( NUMBER[5] ) )
)
Everything we have done up to this point is still all just
syntax. This is a perfectly valid AST and thus the input is valid
syntactically.
But what happens when we start to verify the semantics? What
happens if it turns out that "doCalculation" isn't actually a
function, or doesn't take a single numeric parameter, or doesn't
return a type that can be compatibly assigned to an integer
variable? What happens if "x" had already been declared as a
string variable? Those are all semantic tests and they are
independent of the AST itself. So what was perfectly valid syntax
may be semantically incorrect.
ANTLR lets you choose whether you want to do the semantic checks
right at the end (in your own driver code, or in a tree walker),
or whether you want to do them inline at the lexing or parsing
stages (either to fail quickly or to resolve syntactical
ambiguities). But being able to insert them inline doesn't mean
that they're directly linked the way you seem to have been
saying. They remain separate and distinct things.
I wonder if this is a similar confusion to that caused by having a
combined grammar with literals in the parser rules -- it's
permitted for convenience but it doesn't change the fact that
they're treated separately (by modifying the lexer).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080301/d98a1049/attachment-0001.html
More information about the antlr-interest
mailing list