[antlr-interest] grammar problem with recursion?

Fri Aug 6 04:46:10 PDT 2010

Greetings!

you really need to left factor your rules.
also lex_entry and rule_entry are identical rules! ambiguous. even with
backtracking turned on how are we to know which to recognize. and
perhaps worse, we can not know that we have a type_def (as opposed to
lex_entry or rule_entry) until the very end when we see the '.' this
forces use to parse the entire input just to find the '.' and then
backtrack and reparse the entire input as a type_def.

On Thu, 2010-08-05 at 23:41 -0400, Dervin Thunk wrote:
> Hello. I am new to ANTLR. Below is a grammar I wrote, and I'm trying to
> test it with the following string, but it just stops before it even
> consumes the n_pp... identifier, so it stops at the "conjunction"
> rule. Any idea about what I could be doing wrong?

i do not see any lexer rule to handle white-space. so you are probably
getting an error on the blank between then `s_n2` and `:=` tokens.

you do not include EOF in any of your rules, so when ANTLR encounters
the first character that is not properly parsed under your grammar
rules, it will simply stop the parse.

> 
> <teststring below>
> s_n2 := n_pp_c-pl-crd_le & [ STEM < "100s" >,  SYNSEM [ LKEYS [
> --COMPKEY _of_p_sel_rel,KEYREL.CARG "100" ], PHON.ONSET con ] ].
> 
> 
> grammar tdl;
> 
> options
> {
>        language=Java;
>        backtrack=true;
> }
> 
> type_def
>        :       type avm_def '.'
>        ;

type_def : type avm_def '.' EOF ;

> avm_def
>        :       ':=' conjunction
>        ;
> 
> conjunction
>        :       term
>        | (term '&' conjunction)
>        ;
> 
> term
>        :       type
>        | STRING
>        | feature_term
>        | correference
>        | list
>        | diff_list
>        ;
> 
> type
>        :       ID
>        ;
> 
> feature_term
>        : '[' ']'
>        | ('[' attr_val_list ']')
>        ;
> 
> attr_val_list
>        :       attr_val | (attr_val ',' attr_val)
>        ;
> 
> 
> attr_list
>        : attribute
>        | (attribute'.'attr_val_list)
>        ;
> attr_val
>        : attr_list conjunction
>        ;
> 
> 
> attribute
>        : ID
>        ;
> 
> correference
>        : '#' ID
>        ;
> 
> diff_list
>        :       '<!' '!>'
>        | ('<!' conjunction_list '!>')
>        ;
> 
> conjunction_list
>        : conjunction
>        | (conjunction ',' conjunction_list)
>        ;
> 
> list
>        :       '<' '>' | ('<' conjunction_list '>')
>        | ('<' conjunction_list '>' ',' '...')
>        |       ('<' conjunction_list '.' conjunction '>')
>        ;
> 
> 
> lex_entry
>        :       lex_id avm_def
>        ;
not used anywhere, delete this rule and lex_id.

> 
> lex_id
>        : ID
>        ;

> 
> rule_entry
>        :       rule_id avm_def
>        ;
not used anywhere, delete this rule and rule_id.

> 
> rule_id
>        : ID
>        ;
> 
> ID  :   ('a'..'z'|'A'..'Z'|'_'|'-') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-')*
>    ;
> 
> COMMENT
>    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>    |   ';' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>    | '#|' ( options {greedy=false;} : . )* '|#' {$channel=HIDDEN;}
>    ;
> 
> STRING
>    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"' |
>       '`' ( ESC_SEQ | ~('\\'|'"') )*
>    ;
> 
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
> 
> fragment
> ESC_SEQ
>    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
>    |   UNICODE_ESC
>    |   OCTAL_ESC
>    ;
> 
> fragment
> UNICODE_ESC
>    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
>    ;
> fragment
> OCTAL_ESC
>    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>    |   '\\' ('0'..'7') ('0'..'7')
>    |   '\\' ('0'..'7')
>    ;

add a lexer rule to ignore white-space characters:

WS : (' '|'\t'|'\n'|\'r')+ { $channel = HIDDEN; } ;

hope this helps...
   -jbb