[antlr-interest] Error 206 Recursion woes

Wed Apr 11 15:28:05 PDT 2012

Thanks for the tips I figured out what the issue was, it was 'string   
  :    STRING|.*;'

I got that part working but now I'm having another issue... at times I 
need to pay attention to whitespace as they are used to delimit 
arguments at certain points in time.
But I can't seem to find a way to have the parser switch channels over 
to the hidden channel to check for whitespaces. The only option seems to 
be to take WS off the hidden channel but then for the parts where I 
don't care about whitespace I would have to add WS* all over the place 
to prevent 'UnwantedTokenExceptions'

Attached is an example grammar.

Kyle Ferrio wrote:
> Hi.
>
> Recursion (both in ANTLR grammars and in the target runtimes) are 
> topics of interest to me.  So I looked at what you wrote.  It appears 
> that your partial grammar spec includes some rules which are not 
> referenced and, more notably, omits some necessary rules. 
>  Consequently, whatever errors I might derive from your grammar may 
> not actually be interesting to you.  So I took a different view.
>
> I dashed out two grammars -- not identical but substantially similar 
> to each other and remotely similar to yours -- one for antlr v3 and 
> one for artlr v4 [1].  I've attached the grammars with the caveat that 
> neither is a model to emulate on the whole.  These are toys, 
> absolutely not production-worthy.  But thee toys do highlight a few 
> details which you might consider, such as:
>
>  - It is useful to ponder what tokens a parser really needs and hence 
> which lexer rules should be mere fragments.
>  - Using whitespace as a token may come to tears in your grammar.  And 
> WS does not appear to be semantic in your language.
>  - If all/some WS is truly semantic in your language (as in Python or 
> R) you might do well to filter before lexing, viz. "form fits function."
>
> Both toy grammars -- each processed with and run against its 
> respective version of antlr -- parsed the following input as expected 
> without error:
>
> *   ${${${${${${${${${${${${${${${${Test15}}}}}}}}}}}}}}}}*
>
> (I used the attached Main.java for antlr v3.4 and 
> org.antlr.v4.runtime.misc.TestRig for antlr v4.)
>
> I hope you get something out of this even if it does not answer your 
> question directly.  Sorry to post and run, but checking email may be 
> spotty for me the rest of the week.
>
> Good luck!
>
> Kyle
>
> [1] I probably could have written one grammar for both versions of the 
> antlr tool, but already I've been spoiled by how tolerant Honey Badger 
> (aka antlr v4) is of my misteps.  And it's been a long day.  :)
>
>
> On Tue, Apr 10, 2012 at 7:19 PM, Tangeleno <tangeleno at gmail.com 
> <mailto:tangeleno at gmail.com>> wrote:
>
>     The error is "Ln17:7 Alternative 1: after matching input such as '$'
>     '{''$' '{''$' '{''$' '{''$' '{' the decision cannot predict what comes
>     next due to recursion overflow to id from accessor and to object data
>     from id and to accessor from objectData"
>
>      From what I can gather from the book pg 281 is the closest I can find
>     to this problem it should be a warning and not an error. The
>     problem is
>     coming about when I add the rule objectData as an alternative to the
>     param rule, which I really don't understand as because it's
>     complaining
>     about id accessor and objectData...
>
>     So my question is what am I doing wrong and what can I do to fix it
>     while still allowing nested objectDatas?
>     http://privatepaste.com/6611be2380 is an example of what can happen in
>     this silly language and 6 levels is by no means the cap...
>
>     objectData
>         :    '$' '{'accessor objectMember*'}'
>         ;
>     objectMember
>         :    ('.'|':') accessor;
>
>     accessor:    id indexer* typeCast*;
>
>     id    :    (ID|objectData);
>
>     indexer    :    '[' commaParams ']'
>         ;
>     commaParams
>         :    param (',' param)*
>         ;
>
>     param    :    objectData|number|string
>         ;
>     spaceParams
>         :    param (WS param)* NewLine
>         ;
>     typeCast:    '(' id ')';
>     number    :    INT|FLOAT;
>     string    :    STRING|.*;
>
>     List: http://www.antlr.org/mailman/listinfo/antlr-interest
>     Unsubscribe:
>     http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>

-------------- next part --------------
grammar test;
options {
	backtrack=true;
}

//Test with "	[1,22,333,a b c,"A B C"]" minus the outer quotes and you'll get
//the unwantedtokenexception
//Uncomment WS* to get rid of the exception
indexer	:	/*WS**/LSquare commaArgs RSquare
	;

//Test with "Arg1 ThisIsArg2 1 22 333 "A B C"" minus the outer quotes and it works
// as desired. But if you uncomment {$channel=HIDDEN;} in NewLine and whitespace
// it no longer works properly
spaceArgs
	:	(value|newLineSpaceUnquotedString) (WS (value|newLineSpaceUnquotedString))*
	;

commaArgs
	:	(value|commaSquareUnquotedString) (Comma(value|commaSquareUnquotedString))*
	;

newLineSpaceUnquotedString
	:	( options {greedy=false;} : ~(NewLine|WS))+
	;

commaSquareUnquotedString
	:	( options {greedy=false;} : ~(Comma|RSquare|NewLine))+
	;

value	:	STRING|INT|FLOAT
	;

NewLine	:	(/*{$channel=HIDDEN;}*/'\r'? '\n')
	;

Comma	:	','
	;

RSquare	:	']'
	;

LSquare	:	'['
	;

INT :	'0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

WS  :   ( ' '
        | '\t'
        ) //{$channel=HIDDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

Other 	:	.;