[antlr-interest] problems getting a simple grammar to accept it's input

Thu Mar 24 08:31:47 PDT 2011

On 03/24/2011 11:08 AM, Florian Franzmann wrote:
> Hi,
> 
> I'm having problems getting a (so far) very simple grammar to accept it's input:
> 
> -------------------------------------
> 
> grammar Simulink;
> 
> IDENTIFIER  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>     ;
> 
> INT :	'0'..'9'+
>     ;
> 
> FLOAT
>     :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>     |   '.' ('0'..'9')+ EXPONENT?
>     |   ('0'..'9')+ EXPONENT
>     ;
> 
> COMMENT
>     :   '#' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>     ;
> 
> WS  :   ( ' '
>         | '\t'
>         | '\r'
>         | '\n'
>         ) {$channel=HIDDEN;}
>     ;
> 
> STRING
>     :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
>     ;
> 
> fragment
> EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> 
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
> 
> fragment
> ESC_SEQ
>     :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
>     |   UNICODE_ESC
>     |   OCTAL_ESC
>     ;
> 
> fragment
> OCTAL_ESC
>     :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7')
>     ;
> 
> fragment
> UNICODE_ESC
>     :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
>     ;
> 
> fragment
> BLOCK_BEGIN
> 	:	'{'
> 	;
> 
> fragment
> BLOCK_END
> 	:	'}'
> 	;
> 
> file	:	block+
> 	;
> 
> block	:	IDENTIFIER BLOCK_BEGIN BLOCK_END
> 	;

Because you actually defined BLOCK_BEGIN and BLOCK_END as fragments,
those tokens are never actually created.  Remove the "fragment" from the
TOKEN rules.

> -------------------------------------
> 
> This is the input:
> 
> -------------------------------------
> 
> # bla
> Model {
> }
> 
> -------------------------------------
> 
> And here is what happens when I try to feed it to the grammar:
> 
> -------------------------------------
> $ make smalltests
> antlr3 -verbose -trace -report Simulink.g
> ANTLR Parser Generator  Version 3.3 Nov 30, 2010 12:50:56
> Simulink.g
> Simulink.file:65:8 decision 1: k=1
> javac -classpath antlr/antlr-3.3-complete.jar:. SimulinkLexer.java
> javac -classpath antlr/antlr-3.3-complete.jar:. SimulinkParser.java
> javac -classpath antlr/antlr-3.3-complete.jar:. Test.java
> cat testdata/empty.mdl                | java -classpath antlr/antlr-3.3-complete.jar:. Test
> enter COMMENT # line=1:0
> exit COMMENT M line=2:0
> enter IDENTIFIER M line=2:0
> exit IDENTIFIER   line=2:5
> enter file [@1,6:10='Model',<4>,2:0]
> enter block [@1,6:10='Model',<4>,2:0]
> enter WS   line=2:5
> exit WS { line=2:6
> line 2:6 no viable alternative at character '{'
> enter WS 
>  line=2:7
> exit WS } line=3:0
> line 3:0 no viable alternative at character '}'
> enter WS 
>  line=3:1
> exit WS 
>  line=4:0
> enter WS 
>  line=4:0
> exit WS  line=5:0
> line 5:0 mismatched input '<EOF>' expecting BLOCK_BEGIN
> exit block [@6,17:17='<EOF>',<-1>,5:0]
> exit file [@6,17:17='<EOF>',<-1>,5:0]
> -------------------------------------
> 
> As I understand it the parser consumes 'Model' as IDENTIFIER and goes into
> state block. It ignores a WS, then finds a '{'. This should be recognized as
> BLOCK_BEGIN, which is the next token expected in block---any idea what I'm
> doing wrong?

fragment TOKENs are meant to only be recognized when creating further
tokens.  Since your BLOCK_BEGIN and BLOCK_END are intended to be final
TOKENs (you use them in your parser's "block" rule), you should remove
the "fragment" from those token rules.

> best regards
> Florian Franzmann
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)