[antlr-interest] Expression parsing ideas for ANTLR v4

Sam Harwell sharwell at pixelminegames.com
Tue Jan 19 17:31:16 PST 2010


Several expression parsers are limited to handling the binary operator
portion of the expression. In addition to the obvious limitations, it
poses an additional problem for languages like C++ where the assignment
operators are split (in precedence) from the rest of the binary
operators by the ternary operator (?:). My most complicated production
ANTLR grammar (parses the UnrealScript language) currently uses a
completely new expression parser that offers a great deal more
flexibility than the previous approaches I tried. I don't think it's the
end-all solution for integrating expression parsing into ANTLR for v4,
but I believe it's a worthwhile example to show what's possible. Here
are some pros and cons of the implementation:

 

Pros:

*         The source code declaring the operator precedence and
associativity is very clean (see reference to
UnrealScriptParserHelper.cs below)

*         Very fast execution

*         Supports a great deal more operations than simply binary
operators

*         Supports operator precedence and associativity in groups

*         Directly supports changing the token type during AST
generation - for example if the token '-' is named MINUS, you could
produce an AST with AST_SUBTRACT when it appears as a binary operator
and AST_NEGATE when it appears as a unary operator.

 

Cons:

*         Not currently integrated into the ANTLR language (executes in
code)

*         No compile-time detection of ambiguous operator rules

*         Not implemented as fully as is possible

 

General idea: Parse every component of an expression into a list - this
includes all operators and "atoms". The list is then passed to a
"precedence processor" to produce a tree for that expression.

 

Operator categories: This parser was built with the following categories
in mind, but the grouping operators are not implemented at this point.
With this as a starting place, it's clear how the list might be expanded
in the future:

 

*         Unary operator: either prefix or postfix

*         Binary operator

*         Ternary operator

*         Grouping operator: for example, the ( and ) in (expression)

*         Postfix grouping operator: for example, the ( and ) in
methodName(args) or the [ and ] in var[index].

*         Prefix grouping operator: for example, the ( and ) in
(TargetType)objectToCast.

 

Attached is:

 

*         UnrealScriptParserHelper.cs: The complete code for declaring a
working precedence parser for UnrealScript.

*         Antlr.Runtime.Expressions.zip: The current implementation of
this feature.

 

I'm very interested in any feedback y'all may have on this.

 

Thank you,

Sam Harwell

 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: UnrealScriptParserHelper.cs
Type: application/octet-stream
Size: 5227 bytes
Desc: UnrealScriptParserHelper.cs
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Antlr.Runtime.Expressions.zip
Type: application/x-zip-compressed
Size: 6152 bytes
Desc: Antlr.Runtime.Expressions.zip
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.bin 


More information about the antlr-interest mailing list