[antlr-interest] JavaScript grammar

shmuel siegel antlr at shmuelhome.mine.nu
Sat Mar 29 13:04:49 PDT 2008


Chris Lambrou wrote:
> Hi all,
>
> I couldn't get the ECMAScript by Greg Clemenson on the Grammar List 
> page to work. It's supposed to run in v3.0 without any issues, but I 
> ran into a whole host of problems. Since I'm fairly new to ANTLR, I 
> thought I'd work my way through Terence's book and have a stab at 
> writing a JavaScript grammar from scratch as a learning exercise. 
> Well, I've reached a point where the script may be useful to others, 
> so I've attached it - it compiles cleanly, without any warnings. I 
> could also do with some advice, though.*
> *
>
>    1. Unlike other whitespace characters, line separators (represented
>       by my LT token type) are important in JavaScript, as you're
>       allowed to use them to terminate statements instead of the usual
>       terminating semicolon character. As a result, I cannot 'hide'
>       line separators like other whitespace characters, and my grammar
>       is peppered with LT!* sequences. Is there a way to place the LT
>       tokens on the hidden channel, and then optionally reveal them
>       only in the few rules that require it?
>    2. The grammar doesn't include any ^ or ! modifiers to impose any
>       kind of useful structure to the generated AST. I can see how I
>       ought to do this in the simple cases (e.g. 'return'^
>       expression), but I'm not sure how far I ought to go with this
>       before relying on a subsequent tree grammar to finish the job.
>
> I haven't performed much in the way of formal testing, except that it 
> seems to work with everything I've thrown at it using the ANTLRWorks 
> debugger. I guess I ought to look into writing some gunit tests...
>
> Regards,
>
> Chris
> **
Virtual semicolons is a hard concept. In a different environment (not 
antlr) I had a concept of token pairs. If a line feed occurred between 
certain token pairs, for instance two identifiers, then I replaced the 
line feed by a virtual semicolon. This means that you have to track the 
last two tokens.
There are two types of cases that need special treatment.
    1) Control statements, like break, return etc, if not followed by an 
expression on the same line, alway have the linefeed turned into a 
virtual semicolon.
    2) Right paren followed by any token that could start an expression 
will generate a virtual semicolon so the parser will need to accept 
virtual semicolons as whitespace for "for", "while", and "if" statements.

I can discuss details with you off -line if you need further guidance.



More information about the antlr-interest mailing list