[antlr-interest] Predicates in a lexer?

Andreas Rueckert a_rueckert at gmx.net
Tue Jul 23 08:19:22 PDT 2002


Hi!

Maybe I'm missing something obvious or I should read the manual again, but I
have a question. I'm trying to scan a 'here' document in PHP. For those who
haven't seen this:
=======================================
echo <<<EOT
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta http-equiv="expires" content="0">
...more text...
EOT;
=======================================
is such a here document. Problem is, that the author can use the delimiter he
likes, so replacing EOT with MY_DELIMITER still poses a valid here document. So
you cannot code the delimiter as a static lexer rule (since you don't know by
then).
First idea:
=======================================
// A here document
HERE_DOCUMENT
{ 
  String here_delimiter = null; 
  StringBuffer here_document = new StringBuffer();;
  StringBuffer latestLine = new StringBuffer();
  boolean isComplete = false;
}
	:	"<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
                ( options { generateAmbigWarnings=false; } : '\n' | '\r' | '\r' '\n' ) { newline(); }
                ( 
                  {isComplete == false}?  // Is the document completely parsed?
		  (  
                    ( options { generateAmbigWarnings=false; } : '\r' | '\n' | "\r\n" )  // If it's the end of a line
                      { 
                        newline();
                        String line = latestLine.toString();

                        // Check if the last line is the delimiter of the here document
                        if( line.trim().equals( here_delimiter)) {
                            isComplete = true;
                            $setType( HERE_DOCUMENT);
                            $setText( here_document.toString());
                        } else {  // Nope.
                            if( here_document.length() > 0) {  // If it's not the first line add a newline as the line separator
                                here_document.append( "\n");
                            }
                            here_document.append( latestLine.toString());  // Add the last line to the document
                            latestLine = new StringBuffer();               // Create a new buffer for the latest line
                        }
                      }
                      | 
                    character:~( '\r' | '\n' ) { latestLine.append( character); }  // Append any other character to the latest line.
                  )
                )*
	;
=======================================
Problem: the {isComplete == false}? predicate is not found in the generated
lexer, so this solution doesn't work here (maybe my Antlr version is just too
old?) ... :-(

Hack to workaround the problem: the ( )* in the rule is translated into a
(endless) do { } while(true); loop, that could be exited with a break
statement. So instead of setting the isComplete flag to true, simply exit the
loop via the break;
=======================================
// A here document
HERE_DOCUMENT
{ 
  String here_delimiter = null; 
  StringBuffer here_document = new StringBuffer();;
  StringBuffer latestLine = new StringBuffer();
}
	:	"<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
                ( options { generateAmbigWarnings=false; } : '\n' | '\r' | '\r' '\n' ) { newline(); }
                ( 
		  (  
                    ( options { generateAmbigWarnings=false; } : '\r' | '\n' | "\r\n" )  // If it's the end of a line
                      { 
                        newline();
                        String line = latestLine.toString();

                        // Check if the last line is the delimiter of the here document
                        if( line.trim().equals( here_delimiter)) {
                            $setType( HERE_DOCUMENT);
                            $setText( here_document.toString());
                            break;  // <- end the loop for this token
                        } else {  // Nope.
                            if( here_document.length() > 0) {  // If it's not the first line add a newline as the line separator
                                here_document.append( "\n");
                            }
                            here_document.append( latestLine.toString());  // Add the last line to the document
                            latestLine = new StringBuffer();               // Create a new buffer for the latest line
                        }
                      }
                      | 
                    character:~( '\r' | '\n' ) { latestLine.append( character); }  // Append any other character to the latest line.
                  )
                )*
	;
=======================================
Since this is ugly hack (imagine the Antlr Lexer generator is modified), I'd
like to ask if there's a better solution.

TIA,
Andreas

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list