[antlr-interest] virtual semicolons again: JavaScript, ECMAScript, ActionScript

Thu Dec 28 04:44:18 PST 2006

As a caveat: my javascript parser is not ANTLR based, hence I might have overlooked something. But I can tell you how I handle the virtual semicolon issue.

Most of the work is done in the tokenizer. In general, whitespace is merely a token delimeter but EOL is a special token. The tokenizer emits one token at a time but it remembers the previous token. If the next token would be an EOL, the tokenizer generates a third token. Then if tokens one and three are incompatable, e.g '('  followed by '}',  the tokenizer emits a virtual semicolon instead of the EOL. Otherwise it skips the EOL. 

To accomplish this in ANTLR you would need a wrapper around the tokenizer that remembers the previous token and is capable of keeping the next token in abeyance when the virtual semicolon is generated. In addition, the ANTLR grammar MUST generate the EOL.

The parser also has to help. Although it is usually illegal for an identifier to follow a right paren, it can inside if, for, and while statements. The parser will have to be willing to ignore virtual semicolons for these cases.

I hope that I have been of some help.

Shmuel

________________________________

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gyula L?szl?
Sent: Wednesday, December 27, 2006 9:16 PM
To: ANTLR Interest
Subject: [antlr-interest] virtual semicolons again: JavaScript, ECMAScript,ActionScript

Hello,

I know, this has been a topic on the list before (2005), however this keeps coming back at me:

http://www.antlr.org/pipermail/antlr-interest/2005-October/014116.html

http://www.antlr.org/pipermail/antlr-interest/2005-April/011916.html

shmuel siegel wrote:

> 

>> One aspect that is different is that Javascript expressions end

>> either at a semi colon, or at the earliest new line that makes a

>> valid expression.

> 

> 

> As far as I know, this is not a valid definition for javascript

> statements. It is true for control statements like "return" or "break"

> but not for arithmetic statements.

> 

> Consider,

>     <script>

>         a=3

>         +4

>         alert(a);

>     </script>

> 

> It is legal and will result in an alert with the value 7. In general,

> a new line only marks the end of a statement if the next token cannot

> be part of the previous pattern.

This might be true for standardized JS, however, during my AS3 parser construction and testing on the flex SDK, I've learnt my lesson.

Setup: My expression statements were like this

            level1expression

                        :           level0expression

                                    (           (level1expressionOperator NL* level0expression)+

                                    |

                                    )

                        ;

            statement

                        :           expressionStatement statementEnd

                        ;

            statementEnd

                        :           ';' NL*

                        |           NL+

                        ;

(the last rule generates a warning, I don't really care about :)

this worked on 99% of the flex SDK source code. However, the rest contains the weird Adobe vibe:

Operator after a newline, like:

            something.getHerProperty()

                        .YetAnotherMethodAccessorAfterTheNewLine()

or:

            if ( (this.width > 0) && (this.height > 200)

                        && (this.doesMrBunnyHaveAHat ) )

            {

                        beatTheBunny()

            }

I really can't put the newline into the expressions themselves (FYI: the operator with the lowest precedence swallows

the newline tokens insted of the (unknown to the parser!) real statement end), because there are statement lists like with the following:

            var showStopper:Object =

                        {

                                    foreground:0xff0000,

                                    background:0x550000

                        }

            var theParserNeverMakesItToHere:int = 0

I could exclude the newline, and try with virtual semicolons, however I found, that none of the examples for the virtual semicolons (ASDT's AS2 & 3 grammar, ECMAScript grammar) could handle these cases AND the flex sdk AND not being an ANTLR-freezer, so I feel like there's something here.

Any kind of help is really appreciated,

Thank you for your time,

Gyula László

email:gyula.laszlo AT profund.hu

http://profund.hu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061228/e192fef9/attachment-0001.html