[antlr-interest] JavaScript grammar

Chris Lambrou chris at lambrou.net
Sun Apr 6 04:11:43 PDT 2008


The spec is aware of the ambiguity (start of section 7). Since compliant
parsers aren't required to deal with the problem, advice to script writers
is to avoid any confusion by enclosing any such problem literals in
parentheses. Unless anyone is looking to write a really water-tight script
parser, I think the regular expression literal problem can be ignored, for
the most part.

Chris

On 02/04/2008, Chris Lambrou <chris at lambrou.net> wrote:
>
> Ah, I see the problem now. Having re-read the spec, I think one implied
> solution is to incorporate the parsing of RE literals as part of the parsing
> of the entire script, rather than as tokens. On the other hand, it does also
> explicitly suggest enclosing RE literals in parentheses to avoid parsing
> ambiguities, though this isn't mandatory (i.e. a compliant parser needs to
> deal with ambiguities).
>
> I'm not sure where that leaves me... I'll probably have a stab at
> including rules to fully parse RE literals.
>
> Chris
>
>
> P.S. On a side note, I'm finding that every time I go back through the
> spec, lots of edge cases start to creep out of the woodwork. For example,
> /*...*/ style comments that contain line-terminators should be treated as
> line-terminator tokens, whilst those that don't should be treated like
> white-space. The further I delve into this, the more I think that allowing
> semicolon statement terminators to be optional was a poor choice on the part
> of the spec designers. It just leads to unnecessary complication.
>
>
>
> On 01/04/2008, David Holroyd <dave at badgers-in-foil.co.uk> wrote:
> >
> > On Tue, Apr 01, 2008 at 03:41:27PM +0100, Chris Lambrou wrote:
> >
> > > As for regular expression literals, I'm
> > > inclined to simply treat them as separate Regex tokens without any
> > further
> > > treatment, and leave their analysis to a separate grammar.
> > Interestingly
> > > enough, whilst the ECMAScript spec has a whole section on the
> > composition of
> > > regular expression literals, it doesn't appear to incorporate them
> > into the
> > > rest of the grammar - not that I could see, anyway. I think they can
> > be
> > > included as an alternative in the literal rule, which then becomes
> > >
> > > literal : 'null' | 'true' | 'false' | StringLiteral | NumericLiteral |
> > Regex
> >
> >
> > Regular expression literals are ambiguous with '/' (division) unless you
> > give ANTLR a hand to work out what's what.  Some discussion here,
> >
> >
> > http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control
> >
> >
> > ta,
> > dave
> >
> >
> > --
> > http://david.holroyd.me.uk/
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080406/3681d864/attachment.html 


More information about the antlr-interest mailing list