[antlr-interest] Predicates in a lexer?
Andreas Rueckert
a_rueckert at gmx.net
Tue Jul 23 08:19:22 PDT 2002
Hi!
Maybe I'm missing something obvious or I should read the manual again, but I
have a question. I'm trying to scan a 'here' document in PHP. For those who
haven't seen this:
=======================================
echo <<<EOT
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta http-equiv="expires" content="0">
...more text...
EOT;
=======================================
is such a here document. Problem is, that the author can use the delimiter he
likes, so replacing EOT with MY_DELIMITER still poses a valid here document. So
you cannot code the delimiter as a static lexer rule (since you don't know by
then).
First idea:
=======================================
// A here document
HERE_DOCUMENT
{
String here_delimiter = null;
StringBuffer here_document = new StringBuffer();;
StringBuffer latestLine = new StringBuffer();
boolean isComplete = false;
}
: "<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
( options { generateAmbigWarnings=false; } : '\n' | '\r' | '\r' '\n' ) { newline(); }
(
{isComplete == false}? // Is the document completely parsed?
(
( options { generateAmbigWarnings=false; } : '\r' | '\n' | "\r\n" ) // If it's the end of a line
{
newline();
String line = latestLine.toString();
// Check if the last line is the delimiter of the here document
if( line.trim().equals( here_delimiter)) {
isComplete = true;
$setType( HERE_DOCUMENT);
$setText( here_document.toString());
} else { // Nope.
if( here_document.length() > 0) { // If it's not the first line add a newline as the line separator
here_document.append( "\n");
}
here_document.append( latestLine.toString()); // Add the last line to the document
latestLine = new StringBuffer(); // Create a new buffer for the latest line
}
}
|
character:~( '\r' | '\n' ) { latestLine.append( character); } // Append any other character to the latest line.
)
)*
;
=======================================
Problem: the {isComplete == false}? predicate is not found in the generated
lexer, so this solution doesn't work here (maybe my Antlr version is just too
old?) ... :-(
Hack to workaround the problem: the ( )* in the rule is translated into a
(endless) do { } while(true); loop, that could be exited with a break
statement. So instead of setting the isComplete flag to true, simply exit the
loop via the break;
=======================================
// A here document
HERE_DOCUMENT
{
String here_delimiter = null;
StringBuffer here_document = new StringBuffer();;
StringBuffer latestLine = new StringBuffer();
}
: "<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
( options { generateAmbigWarnings=false; } : '\n' | '\r' | '\r' '\n' ) { newline(); }
(
(
( options { generateAmbigWarnings=false; } : '\r' | '\n' | "\r\n" ) // If it's the end of a line
{
newline();
String line = latestLine.toString();
// Check if the last line is the delimiter of the here document
if( line.trim().equals( here_delimiter)) {
$setType( HERE_DOCUMENT);
$setText( here_document.toString());
break; // <- end the loop for this token
} else { // Nope.
if( here_document.length() > 0) { // If it's not the first line add a newline as the line separator
here_document.append( "\n");
}
here_document.append( latestLine.toString()); // Add the last line to the document
latestLine = new StringBuffer(); // Create a new buffer for the latest line
}
}
|
character:~( '\r' | '\n' ) { latestLine.append( character); } // Append any other character to the latest line.
)
)*
;
=======================================
Since this is ugly hack (imagine the Antlr Lexer generator is modified), I'd
like to ask if there's a better solution.
TIA,
Andreas
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list