[antlr-interest] Re: trying to understand greedy option

Eric Mahurin eric_mahurin at yahoo.com
Tue Aug 3 16:16:36 PDT 2004


In this:

> parseFile
>     : (field COMA)* field (COMA)?
>     ;

the (field COMA)* eats up everything if you set greedy=true (which is
the default - with warnings) because "field" can have an infinite
length.  You must remember, antlr doesn't have backtracking like
regular expressions.  You can simulate the effect of backtracking with
syntactic predicates, like this:

parseFile
    : ( (field COMMA field) => field COMA )* field (COMA)?
    ;

You'll get other warnings about these being superfluous, but these are
bogus.

The better and simple solution is to refactor.  Try doing this type of
thing:

parseFile
    : field (COMA field)* (COMA)?
    ;

Eric

--- In antlr-interest at yahoogroups.com, "xdecoret" <xdecoret at f...> wrote:
> This post is following my earlier one on non-determinism. It seems I
> can shut the warning up by using a option {greedy=true;} but then I
> run into another problem. Here is a simple grammar 
> 
> header {
> }		
> options {
>     language="Cpp";
>     genHashLines = false;
> }
> 
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%        PARSER                              %%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> {
> }
> class barparser extends Parser;
> options {
>     k=5;
>     buildAST = false;
>     defaultErrorHandler=false;
> }
> {
> }
> parseFile
>     : (field COMA)* field (COMA)?
>     ;
> field
>     : id EQUAL fieldValue
>     ;
> fieldValue 
>     : (fieldValuePart PLUS)* fieldValuePart 
>     ;
> fieldValuePart 
>     : STRING
>     | NAME
>     ;
> id
>     : NAME
>     ;
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%        LEXER                               %%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
> //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> {
> }
> class barlexer extends Lexer;
> options {
>     k=3;
>     defaultErrorHandler=false;
>     caseSensitive=false;
>     charVocabulary='\3'..'\377';
> }
> {
> }
> PLUS : '+'
>     ;
> COMA : ','
>     ;
> EQUAL : '='
>     ;
> NAME
>     : ('a'..'z'|'0'..'9'|'_'|'-'|'\''|':'|'.')+ 
>     ;
> protected
> ESC
>     : '\\' ~('\n')
>     ;
> protected
> STRING_INTERNAL
>     : ( ('\\' ~('\n'))=> ESC
>         | ( '\r' { newline(); }
>             | '\n' { newline(); }
>             | '\\' '\n'   { newline(); }
>             )
>         | ~( '"' | '\r' | '\n' | '\\' )
>         )*
>     ;
> STRING: '"' t:STRING_INTERNAL '"'
>         {
>             $setText(t->getText());
>         }
>     ;
> // The \r\n below is to parse DOS file end of lines
> WS
>     : ( ' ' | '\t' | ('\n'| "\r\n") { newline(); })
>         {
>             $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP);
>         }
> 	;
> 
> Antlr-izing it, I get a warning about non-determinism that I can solve
>  with :
> 
> fieldValue 
>     : (options {greedy=true;} : fieldValuePart PLUS)* fieldValuePart 
>     ;
> 
> 
> But then, I can parse the following file:
> 
> value = toto,
> value = "toto",
> value = "toto" + tata + "titi" + tutu,
> value = lastone
> 
> But I cannot parse the same input if I remove the last line ?!?!
> 
> Any explanations to help me understand?



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list