[antlr-interest] distinguish "3 + 4" from "3 +4"

Jim Idle jimi at temporal-wave.com
Wed Oct 5 11:28:09 PDT 2011


Ambiguities based on whitespace are a bit silly because you cannot catch
errors. How do you know that when the input is:

2 -2

That the user actually has a type and forgot the space after the '-'? You
don't and it will parse as two numbers. If you have any control over this,
then you should rethink it. Use a ',' to separate, and then the whitespace
can be ignored.

However, there are two solutions, the first of which is lexer based and
not really recommended - however sometimes this might be better and it
might help others do similar things. The second is recommended if you
cannot change the input format and is parser based.


Lexer solution:

grammar test;

a : exp* EOF;

exp : atom (PLUS atom)*
    ;

atom
  	: NUMBER
  	;
  	
NUMBER :	'0'..'9'+
    ;

PLUS : '+'
     (
		  ('0'..'9')=> '0'..'9'+ { $type = NUMBER; }
          |
     )
;

MINUS : '-'
     (
		  ('0'..'9')=> '0'..'9'+ { $type = NUMBER; }
          |
     )
;


WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;


But then you don't have unary precedence 'correct'.

- - - - - - - - - --

Parser solution (with precedence correct):

grammar test;

options { output=AST; }

tokens { UMINUS; UPLUS; }

a : exp* EOF;

exp : uatom
		(
			{ ((TokenStream)input ).get( input.index()+1
).getType() == WS }?=> (PLUS|MINUS)^ uatom
		)*
    ;

uatom
    : MINUS uatom 	-> ^(UMINUS[$MINUS] uatom)
    | PLUS uatom 	-> ^(UPLUS[$PLUS] uatom)
    | atom
    ;

atom
  	: NUMBER
  	;
  	
NUMBER
	:	'0'..'9'+
    ;

PLUS 	: '+'	;
MINUS 	: '-'	;

WS  :   ( ' '
        | '\t'
        ) {$channel=HIDDEN;}
    ;

NL  :   ( '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;



> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Andreas Liebig
> Sent: Wednesday, October 05, 2011 4:14 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] distinguish "3 + 4" from "3 +4"
>
> Hello,
> I am not very experienced with ANTLR, and I would like to ask for some
> ideas how to solve this task:
>
> I have to distinguish input streams like
> "3 + 4" (parsed as three tokens NUMBER PLUS NUMBER) from
> "3 +4" (parsed as NUMBER NUMBER, because the + is part of the number
> +4).
>
> I would like to ignore whitespace in general using the
> "$channel=HIDDEN;" syntax. But only in this situation whitespace does
> matter.
> Can you guide me to a good explanation of a possible solution?
>
> Thanks
> Andreas
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list