[antlr-interest] Re: Automatic semicolon insertion
lgc at mail1.jpl.nasa.gov
lgc at mail1.jpl.nasa.gov
Wed Dec 17 11:46:41 PST 2003
Henry--
I can think of a couple of refactorings that might help. First, move semicolon out of statement so that
(statement)* becomes ( (statement semicolon) | block )
Then change semicolon to
semicolon
:
{ newline_test() }?
| SEMICOLON
;
and define an appropriate newline_test().
You'll then have to do some more work to let block handle the optional semicolon for the last statement.
--Loring
--- In antlr-interest at yahoogroups.com, whittaker55 at y... wrote:
> Hi,
> I'm writing a parser for JavaScript but I've encountered a major
> stumbling block with the concept of 'automatic semicolon insertion' -
> in short, that statements need not be terminated by semicolons
> before an RBRACE or after an newline.
> Initially, to get around warning messages, it was implemented as a
> Token filter on the input to the parser but to cover all commmon
> eventualities, this is not possible as it requires the full grammar
> context. In the grammar, where I used to have a rule:
>
> semicolon: SEMICOLON;
>
> I replaced this with:
>
> auto_semicolon:
> (SEMICOLON
> { LA(1) == RBRACE || LT(1).getLine() != lastTokenLine }?
> )
> ;
> This does appear to work but Antlr generates lots of nondeterminism
> warnings that I cannot seem to turn off with any of the options.
> Firstly, is this the only approach I can take? And how can I
> determine whether my grammar parses correctly given these warnings?
> Are there any resources that deal with this kind of ambiguity?
>
> Here is a small subset of the gramar:
>
> primary_expression:
> IDENTIFIER
> | literal
> | LPAREN expression RPAREN
> ;
>
>
> arguments:
> LPAREN (argument_list)? RPAREN
> ;
>
> argument_list:
> (primary_expression
> (COMMA primary_expression
> )*
> )
> ;
>
>
>
> function_call_expression:
> primary_expression
> ( arguments
> | LBRACKET expression RBRACKET
> | DOT IDENTIFIER)
> )*
> ;
>
> expression :
> function_call_expression
> ;
>
> statement:
> expression_statement
> | empty_statement
> | block
> ;
>
>
>
> block :
> LBRACE (statement)* RBRACE
> ;
>
>
> expression_statement:
> expression auto_semicolon
> ;
>
>
> empty_statement :
> SEMICOLON;
>
> Problem statements are:
> // Here y is assumed to be a function call and the z assignment
> // becomes a second statement i.e. x = y(a+b); z = q;
> x = y
> (a + b)
> z = q;
> // Here we have an assignment and an addition expression
> x = y;
> (a + b );
>
>
> The default (greedy) nature of the generated parser seems to ensure
> that both of the above statements are parsed correctly though there
> is an ambigutiy between the rules for 'arguments'
> and 'primary_expression'. I'd be very grateful for any help people
> can shine on this.
>
> Many thanks,
>
> Henry
Yahoo! Groups Links
To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list