[antlr-interest] URGENT : HTML and SCRIPT tag

johnclarke72 johnclarke at hotmail.com
Sun Jun 23 14:35:40 PDT 2002


At the moment I am still working on a HTML Parser.

Basically I need the current Tag Definitions to work but I also need 
it to process a script tag so that it keeps all of the data (with 
spaces, tabs,etc) between the begin and end tags.  EG :

<script>
  test code
  some lines of code
          some more       lines of code
</script>

The script tag could also contain attributes like any other tag.

I have included my Grammar for the lexer below.  How can I modify 
this grammar so that it can handle the script tag ?

John

// Import the required Classes
header
{
   import java.util.*;
   import antlr.*;
}

// Define the class
class HTMLLexer extends Lexer;

// Set the options for the Lexer
options
{
  k=9;                             // Set the look ahead to 9 
Characters
  caseSensitive = false;           // Set Case Sensitivity to false
  charVocabulary = '\1' .. '\377'; // Set the Lexer Character 
Vocabulary
  testLiterals = false;            // Don't test against the Literals 
table
  exportVocab = HTMLLexer;         // The Grammar to export
}

// Text Data - This is used for Text, Tags and Attributes
TEXTDATA : (~(' ' | '\r' | '\n' | '\t' | '<' | '>' | '/' | '!' | '=' 
| '"' | '\''))+;

// HTML Comments
HTMLCOMMENT : "<!--"! (options {greedy=false;} : .)* "-->"!;

// Document Type Definition
HTMLDTD : "<!doctype"! (options {greedy=false;} : .)* ">"!;

//
// Main HTML Tag Section
//

STARTTAG 
{
    Hashtable tagAttributes = null;
    TagToken returnToken = null;
}
  : "<"! tagName:TEXTDATA (WS (tagAttributes = ATTRIBUTES)?)?
    {
      returnToken = new TagToken(tagName.getText(),tagAttributes);
      $setToken(returnToken);
    }
    (">"!);

// Definition of an End Tag
ENDTAG   : "</"! TEXTDATA ">"!;

// For processing HTML Attributes

// TAGVALUE is used to define attribute values that have quotes
protected TAGVALUE : ('"'!|'\''!) (options {greedy=false;} : ~
('"'|'\''))* ('"'!|'\''!);

// Definition for Attributes
protected ATTRIBUTES returns [Hashtable a = new Hashtable()]
: ( ATTRIBUTE[a] (WS ATTRIBUTE[a])* )
;

protected ATTRIBUTE [Hashtable h]
: key:TEXTDATA { h.put(key.getText(), ""); }
    ( '=' (WS)?
        ( v1: TEXTDATA          { h.put(key.getText(), v1.getText());}
        | v2: TAGVALUE          { h.put(key.getText(), v2.getText());}
        )
    )?
    ;

// Ignore all White Space
WS : ( ' '
     | '\t'
     | '\r' '\n' { newline(); }
     | '\n' { newline(); }
     )
     {$setType(Token.SKIP);} //ignore this token
;



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list