[antlr-interest] Automatic semicolon insertion

Wed Dec 17 08:56:09 PST 2003

Hi,
I'm writing a parser for JavaScript but I've encountered a major 
stumbling block with the concept of 'automatic semicolon insertion' -
 in short, that statements need not be terminated by semicolons 
before an RBRACE or after an newline.
Initially, to get around warning messages, it was implemented as a 
Token filter on the input to the parser but to cover all commmon 
eventualities, this is not possible as it requires the full grammar 
context. In the grammar, where I used to have a rule:

semicolon: SEMICOLON;

I replaced this with:

auto_semicolon:
	(SEMICOLON
          { LA(1) == RBRACE || LT(1).getLine() != lastTokenLine }? 
	)
	;
This does appear to work but Antlr generates lots of nondeterminism 
warnings that I cannot seem to turn off with any of the options. 
Firstly, is this the only approach I can take? And how can I 
determine whether my grammar parses correctly given these warnings? 
Are there any resources that deal with this kind of ambiguity?

Here is a small subset of the gramar:

    primary_expression:
        IDENTIFIER 
        | literal 
        | LPAREN expression RPAREN
        ;

    arguments:
        LPAREN  (argument_list)? RPAREN
       ;

    argument_list:
        (primary_expression  
            (COMMA   primary_expression
                )*
            )
        ;

function_call_expression:
		primary_expression 
		(  arguments
            | LBRACKET expression RBRACKET
            |  DOT IDENTIFIER)
        )*
        ;

    expression :
         function_call_expression 
	;

statement:
   expression_statement
   | empty_statement
   | block
    ;

block :
    LBRACE (statement)* RBRACE
    ;

expression_statement:    
       expression auto_semicolon
    ;     

empty_statement :
    SEMICOLON;

Problem statements are:
// Here y is assumed to be a function call and the z assignment 
// becomes a second statement i.e. x = y(a+b); z = q;
x = y
(a + b)
z = q;
// Here we have an assignment and an addition expression
x = y;
(a + b );

The default (greedy) nature of the generated parser seems to ensure 
that both of the above statements are parsed correctly though there 
is an ambigutiy between the rules for 'arguments' 
and 'primary_expression'.  I'd be very grateful for any help people 
can shine on this.

Many thanks,

Henry

Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/