[antlr-interest] Predicates in a lexer?

Tue Jul 23 11:02:24 PDT 2002

Can you say that the delimiter is limited to valid IDs?  If so then you can
dynamically add that as a literal to your literals table (in a lexer rule.)
If not then the end delimiter could be very ambiguous.

Monty

> -----Original Message-----
> From: Andreas Rueckert [mailto:a_rueckert at gmx.net]
> Sent: Tuesday, July 23, 2002 8:19 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Predicates in a lexer?
> 
> 
> Hi!
> 
> Maybe I'm missing something obvious or I should read the 
> manual again, but I
> have a question. I'm trying to scan a 'here' document in PHP. 
> For those who
> haven't seen this:
> =======================================
> echo <<<EOT
> <meta http-equiv="content-type" content="text/html; 
> charset=iso-8859-1">
> <meta http-equiv="expires" content="0">
> ...more text...
> EOT;
> =======================================
> is such a here document. Problem is, that the author can use 
> the delimiter he
> likes, so replacing EOT with MY_DELIMITER still poses a valid 
> here document. So
> you cannot code the delimiter as a static lexer rule (since 
> you don't know by
> then).
> First idea:
> =======================================
> // A here document
> HERE_DOCUMENT
> { 
>   String here_delimiter = null; 
>   StringBuffer here_document = new StringBuffer();;
>   StringBuffer latestLine = new StringBuffer();
>   boolean isComplete = false;
> }
> 	:	"<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
>                 ( options { generateAmbigWarnings=false; } : 
> '\n' | '\r' | '\r' '\n' ) { newline(); }
>                 ( 
>                   {isComplete == false}?  // Is the document 
> completely parsed?
> 		  (  
>                     ( options { generateAmbigWarnings=false; 
> } : '\r' | '\n' | "\r\n" )  // If it's the end of a line
>                       { 
>                         newline();
>                         String line = latestLine.toString();
> 
>                         // Check if the last line is the 
> delimiter of the here document
>                         if( line.trim().equals( here_delimiter)) {
>                             isComplete = true;
>                             $setType( HERE_DOCUMENT);
>                             $setText( here_document.toString());
>                         } else {  // Nope.
>                             if( here_document.length() > 0) { 
>  // If it's not the first line add a newline as the line separator
>                                 here_document.append( "\n");
>                             }
>                             here_document.append( 
> latestLine.toString());  // Add the last line to the document
>                             latestLine = new StringBuffer();  
>              // Create a new buffer for the latest line
>                         }
>                       }
>                       | 
>                     character:~( '\r' | '\n' ) { 
> latestLine.append( character); }  // Append any other 
> character to the latest line.
>                   )
>                 )*
> 	;
> =======================================
> Problem: the {isComplete == false}? predicate is not found in 
> the generated
> lexer, so this solution doesn't work here (maybe my Antlr 
> version is just too
> old?) ... :-(
> 
> Hack to workaround the problem: the ( )* in the rule is 
> translated into a
> (endless) do { } while(true); loop, that could be exited with a break
> statement. So instead of setting the isComplete flag to true, 
> simply exit the
> loop via the break;
> =======================================
> // A here document
> HERE_DOCUMENT
> { 
>   String here_delimiter = null; 
>   StringBuffer here_document = new StringBuffer();;
>   StringBuffer latestLine = new StringBuffer();
> }
> 	:	"<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
>                 ( options { generateAmbigWarnings=false; } : 
> '\n' | '\r' | '\r' '\n' ) { newline(); }
>                 ( 
> 		  (  
>                     ( options { generateAmbigWarnings=false; 
> } : '\r' | '\n' | "\r\n" )  // If it's the end of a line
>                       { 
>                         newline();
>                         String line = latestLine.toString();
> 
>                         // Check if the last line is the 
> delimiter of the here document
>                         if( line.trim().equals( here_delimiter)) {
>                             $setType( HERE_DOCUMENT);
>                             $setText( here_document.toString());
>                             break;  // <- end the loop for this token
>                         } else {  // Nope.
>                             if( here_document.length() > 0) { 
>  // If it's not the first line add a newline as the line separator
>                                 here_document.append( "\n");
>                             }
>                             here_document.append( 
> latestLine.toString());  // Add the last line to the document
>                             latestLine = new StringBuffer();  
>              // Create a new buffer for the latest line
>                         }
>                       }
>                       | 
>                     character:~( '\r' | '\n' ) { 
> latestLine.append( character); }  // Append any other 
> character to the latest line.
>                   )
>                 )*
> 	;
> =======================================
> Since this is ugly hack (imagine the Antlr Lexer generator is 
> modified), I'd
> like to ask if there's a better solution.
> 
> TIA,
> Andreas
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/ 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/