[antlr-interest] JavaScript grammar

Sat Mar 29 21:15:35 PDT 2008

Chris Lambrou wrote:
> Hi all,
>
> I couldn't get the ECMAScript by Greg Clemenson on the Grammar List 
> page to work. It's supposed to run in v3.0 without any issues, but I 
> ran into a whole host of problems. Since I'm fairly new to ANTLR, I 
> thought I'd work my way through Terence's book and have a stab at 
> writing a JavaScript grammar from scratch as a learning exercise. 
> Well, I've reached a point where the script may be useful to others, 
> so I've attached it - it compiles cleanly, without any warnings. I 
> could also do with some advice, though.*
> *
>
>    1. Unlike other whitespace characters, line separators (represented
>       by my LT token type) are important in JavaScript, as you're
>       allowed to use them to terminate statements instead of the usual
>       terminating semicolon character. As a result, I cannot 'hide'
>       line separators like other whitespace characters, and my grammar
>       is peppered with LT!* sequences. Is there a way to place the LT
>       tokens on the hidden channel, and then optionally reveal them
>       only in the few rules that require it?
>    2. The grammar doesn't include any ^ or ! modifiers to impose any
>       kind of useful structure to the generated AST. I can see how I
>       ought to do this in the simple cases (e.g. 'return'^
>       expression), but I'm not sure how far I ought to go with this
>       before relying on a subsequent tree grammar to finish the job.
>
> I haven't performed much in the way of formal testing, except that it 
> seems to work with everything I've thrown at it using the ANTLRWorks 
> debugger. I guess I ought to look into writing some gunit tests...
>
> Regards,
>
> Chris

It is most likely not kosher, but if you can look at an LT in a sequence 
of tokens test if it is a virtual semicolon (without knowing anything 
but the adjoining tokens) then some sort of preprocessor (I'm thinking: 
lex, filter tokens into new lex stream, parse) might be able to convert 
what is needed. You might call the filter a TokenSedStream or something 
like that. I did something like that (but with the text) to deal with 
indentation sensitivity in my only attempt with ANTLR. As I said, not 
kosher, but if all else fails "You gotta go with what works." (Law #37)