[antlr-interest] Predicates in a lexer?
mzukowski at yci.com
mzukowski at yci.com
Tue Jul 23 11:02:24 PDT 2002
Can you say that the delimiter is limited to valid IDs? If so then you can
dynamically add that as a literal to your literals table (in a lexer rule.)
If not then the end delimiter could be very ambiguous.
Monty
> -----Original Message-----
> From: Andreas Rueckert [mailto:a_rueckert at gmx.net]
> Sent: Tuesday, July 23, 2002 8:19 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Predicates in a lexer?
>
>
> Hi!
>
> Maybe I'm missing something obvious or I should read the
> manual again, but I
> have a question. I'm trying to scan a 'here' document in PHP.
> For those who
> haven't seen this:
> =======================================
> echo <<<EOT
> <meta http-equiv="content-type" content="text/html;
> charset=iso-8859-1">
> <meta http-equiv="expires" content="0">
> ...more text...
> EOT;
> =======================================
> is such a here document. Problem is, that the author can use
> the delimiter he
> likes, so replacing EOT with MY_DELIMITER still poses a valid
> here document. So
> you cannot code the delimiter as a static lexer rule (since
> you don't know by
> then).
> First idea:
> =======================================
> // A here document
> HERE_DOCUMENT
> {
> String here_delimiter = null;
> StringBuffer here_document = new StringBuffer();;
> StringBuffer latestLine = new StringBuffer();
> boolean isComplete = false;
> }
> : "<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
> ( options { generateAmbigWarnings=false; } :
> '\n' | '\r' | '\r' '\n' ) { newline(); }
> (
> {isComplete == false}? // Is the document
> completely parsed?
> (
> ( options { generateAmbigWarnings=false;
> } : '\r' | '\n' | "\r\n" ) // If it's the end of a line
> {
> newline();
> String line = latestLine.toString();
>
> // Check if the last line is the
> delimiter of the here document
> if( line.trim().equals( here_delimiter)) {
> isComplete = true;
> $setType( HERE_DOCUMENT);
> $setText( here_document.toString());
> } else { // Nope.
> if( here_document.length() > 0) {
> // If it's not the first line add a newline as the line separator
> here_document.append( "\n");
> }
> here_document.append(
> latestLine.toString()); // Add the last line to the document
> latestLine = new StringBuffer();
> // Create a new buffer for the latest line
> }
> }
> |
> character:~( '\r' | '\n' ) {
> latestLine.append( character); } // Append any other
> character to the latest line.
> )
> )*
> ;
> =======================================
> Problem: the {isComplete == false}? predicate is not found in
> the generated
> lexer, so this solution doesn't work here (maybe my Antlr
> version is just too
> old?) ... :-(
>
> Hack to workaround the problem: the ( )* in the rule is
> translated into a
> (endless) do { } while(true); loop, that could be exited with a break
> statement. So instead of setting the isComplete flag to true,
> simply exit the
> loop via the break;
> =======================================
> // A here document
> HERE_DOCUMENT
> {
> String here_delimiter = null;
> StringBuffer here_document = new StringBuffer();;
> StringBuffer latestLine = new StringBuffer();
> }
> : "<<<" d:IDENT { here_delimiter = d.getText() + ";"; }
> ( options { generateAmbigWarnings=false; } :
> '\n' | '\r' | '\r' '\n' ) { newline(); }
> (
> (
> ( options { generateAmbigWarnings=false;
> } : '\r' | '\n' | "\r\n" ) // If it's the end of a line
> {
> newline();
> String line = latestLine.toString();
>
> // Check if the last line is the
> delimiter of the here document
> if( line.trim().equals( here_delimiter)) {
> $setType( HERE_DOCUMENT);
> $setText( here_document.toString());
> break; // <- end the loop for this token
> } else { // Nope.
> if( here_document.length() > 0) {
> // If it's not the first line add a newline as the line separator
> here_document.append( "\n");
> }
> here_document.append(
> latestLine.toString()); // Add the last line to the document
> latestLine = new StringBuffer();
> // Create a new buffer for the latest line
> }
> }
> |
> character:~( '\r' | '\n' ) {
> latestLine.append( character); } // Append any other
> character to the latest line.
> )
> )*
> ;
> =======================================
> Since this is ugly hack (imagine the Antlr Lexer generator is
> modified), I'd
> like to ask if there's a better solution.
>
> TIA,
> Andreas
>
>
>
> Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list