[antlr-interest] Re: Automatic semicolon insertion

Wed Dec 17 11:46:41 PST 2003

Henry--

I can think of a couple of refactorings that might help.  First, move semicolon out of statement so that

(statement)* becomes ( (statement semicolon) | block )

Then change semicolon to

semicolon
     :
     { newline_test() }?
     |     SEMICOLON
     ;

and define an appropriate newline_test().

You'll then have to do some more work to let block handle the optional semicolon for the last statement.

--Loring

--- In antlr-interest at yahoogroups.com, whittaker55 at y... wrote:
> Hi,
> I'm writing a parser for JavaScript but I've encountered a major 
> stumbling block with the concept of 'automatic semicolon insertion' -
>  in short, that statements need not be terminated by semicolons 
> before an RBRACE or after an newline.
> Initially, to get around warning messages, it was implemented as a 
> Token filter on the input to the parser but to cover all commmon 
> eventualities, this is not possible as it requires the full grammar 
> context. In the grammar, where I used to have a rule:
> 
> semicolon: SEMICOLON;
> 
> I replaced this with:
> 
> auto_semicolon:
> 	(SEMICOLON
>           { LA(1) == RBRACE || LT(1).getLine() != lastTokenLine }? 
> 	)
> 	;
> This does appear to work but Antlr generates lots of nondeterminism 
> warnings that I cannot seem to turn off with any of the options. 
> Firstly, is this the only approach I can take? And how can I 
> determine whether my grammar parses correctly given these warnings? 
> Are there any resources that deal with this kind of ambiguity?
> 
> Here is a small subset of the gramar:
> 
>     primary_expression:
>         IDENTIFIER 
>         | literal 
>         | LPAREN expression RPAREN
>         ;
> 
>    
>     arguments:
>         LPAREN  (argument_list)? RPAREN
>        ;
> 
>     argument_list:
>         (primary_expression  
>             (COMMA   primary_expression
>                 )*
>             )
>         ;
>              
> 
>         
> function_call_expression:
> 		primary_expression 
> 		(  arguments
>             | LBRACKET expression RBRACKET
>             |  DOT IDENTIFIER)
>         )*
>         ;
> 
>     expression :
>          function_call_expression 
> 	;
> 
> statement:
>    expression_statement
>    | empty_statement
>    | block
>     ;
> 
> 
> 	
> block :
>     LBRACE (statement)* RBRACE
>     ;
> 
>     
> expression_statement:    
>        expression auto_semicolon
>     ;     
> 
> 	
> empty_statement :
>     SEMICOLON;
> 
> Problem statements are:
> // Here y is assumed to be a function call and the z assignment 
> // becomes a second statement i.e. x = y(a+b); z = q;
> x = y
> (a + b)
> z = q;
> // Here we have an assignment and an addition expression
> x = y;
> (a + b );
> 
> 
> The default (greedy) nature of the generated parser seems to ensure 
> that both of the above statements are parsed correctly though there 
> is an ambigutiy between the rules for 'arguments' 
> and 'primary_expression'.  I'd be very grateful for any help people 
> can shine on this.
> 
> Many thanks,
> 
> Henry

Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/