[antlr-interest] Grammar Problem

johnclarke72 johnclarke at hotmail.com
Tue Jun 4 04:20:58 PDT 2002


Hi,
   I am currently having problems with a HTML Grammar that I am 
writing.  The Grammar has been added to the end of this e-mail.

When I enter HTML comments (<!-- This is a Comment -->) and End Tags 
(</endTag>) it handles it correctly.

However,  if I enter <test> or anything similar to this I get 
an "line 1: unexpected token: test" error message.  

How can I get this to work ?

I would be grateful for all advice offered regarding this.

John

HTMLTagLexer.g
==============

// Import the required Classes
header
{
   import java.util.*;
   import antlr.*;
}

// Define the class
class HTMLTagLexer extends Lexer;

// Set the options for the Lexer
options
{
  k=3;                             // Set the look ahead to 3 
Characters
  caseSensitive = false;           // Set Case Sensitivity to false
  charVocabulary = '\1' .. '\377'; // Set the Lexer Character 
Vocabulary
  testLiterals = false;            // Don't test against the Literals 
table
  exportVocab = HTMLTagLexer;      // The Grammar to export
}

// The routines that will enable us to switch between lexer states
{
   // The Current Selector
   TokenStreamSelector selector;

   // The method that will enable us to switch between lexer states
   public void setSelector(TokenStreamSelector tokenStreamSelector)
   {
     selector = tokenStreamSelector;
   }
}

// Define the Tokens required for the Grammar

// Various HTML Marker Tags
INITSTARTTAG   : "<";
FINISHSTARTTAG : ">";
EQUALS         : "=";

// HTML Comments
HTMLCOMMENT : "!--"! (options {greedy=false;} : .)* "-->"!
              {selector.pop();}
              ;

// Main HTML Tags Section.  This defines the Tag names,
// attributes and attribute values

// TAGDATA is used to define the Tag Name and names of
// attributes used within the tag
TAGDATA : (~(' ' | '\r' | '\n' | '\t' | '<' | '>' | '/' | '!' | '='))
+;

// TAGVALUE is used to define the values for attributes
// used within the tags


// Definition of an End Tag
ENDTAG   : '/'! ( 'a'..'z' )+ ">"! {selector.pop();};

// Ignore all White Space
WS : ( ' '
     | '\t'
     | '\r' '\n' { newline(); }
     | '\n' { newline(); }
     )
     {$setType(Token.SKIP);} //ignore this token
;

HTMLTagParser.g
===============

// Define the class
class HTMLTagParser extends Parser;

// Set the options for the Parser
options
{
  importVocab = HTMLTagLexer;     // The Grammar to import
}


// The Parser Rules
processHTML:
   htmlComment:HTMLCOMMENT {System.out.println
("COMMENT : "+htmlComment.getText().trim());}
   | startHTMLTag
   | endTag:ENDTAG {System.out.println("ENDTAG : "+endTag.getText
());};

startHTMLTag : INITSTARTTAG tagName:TAGDATA
               {System.out.println("STARTTAG : "+tagName.getText());}
               FINISHSTARTTAG;




 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list