[antlr-interest] Expression parsing ideas for ANTLR v4
Sam Harwell
sharwell at pixelminegames.com
Tue Jan 19 17:31:16 PST 2010
Several expression parsers are limited to handling the binary operator
portion of the expression. In addition to the obvious limitations, it
poses an additional problem for languages like C++ where the assignment
operators are split (in precedence) from the rest of the binary
operators by the ternary operator (?:). My most complicated production
ANTLR grammar (parses the UnrealScript language) currently uses a
completely new expression parser that offers a great deal more
flexibility than the previous approaches I tried. I don't think it's the
end-all solution for integrating expression parsing into ANTLR for v4,
but I believe it's a worthwhile example to show what's possible. Here
are some pros and cons of the implementation:
Pros:
* The source code declaring the operator precedence and
associativity is very clean (see reference to
UnrealScriptParserHelper.cs below)
* Very fast execution
* Supports a great deal more operations than simply binary
operators
* Supports operator precedence and associativity in groups
* Directly supports changing the token type during AST
generation - for example if the token '-' is named MINUS, you could
produce an AST with AST_SUBTRACT when it appears as a binary operator
and AST_NEGATE when it appears as a unary operator.
Cons:
* Not currently integrated into the ANTLR language (executes in
code)
* No compile-time detection of ambiguous operator rules
* Not implemented as fully as is possible
General idea: Parse every component of an expression into a list - this
includes all operators and "atoms". The list is then passed to a
"precedence processor" to produce a tree for that expression.
Operator categories: This parser was built with the following categories
in mind, but the grouping operators are not implemented at this point.
With this as a starting place, it's clear how the list might be expanded
in the future:
* Unary operator: either prefix or postfix
* Binary operator
* Ternary operator
* Grouping operator: for example, the ( and ) in (expression)
* Postfix grouping operator: for example, the ( and ) in
methodName(args) or the [ and ] in var[index].
* Prefix grouping operator: for example, the ( and ) in
(TargetType)objectToCast.
Attached is:
* UnrealScriptParserHelper.cs: The complete code for declaring a
working precedence parser for UnrealScript.
* Antlr.Runtime.Expressions.zip: The current implementation of
this feature.
I'm very interested in any feedback y'all may have on this.
Thank you,
Sam Harwell
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UnrealScriptParserHelper.cs
Type: application/octet-stream
Size: 5227 bytes
Desc: UnrealScriptParserHelper.cs
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Antlr.Runtime.Expressions.zip
Type: application/x-zip-compressed
Size: 6152 bytes
Desc: Antlr.Runtime.Expressions.zip
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.bin
More information about the antlr-interest
mailing list