[antlr-interest] Re: Grammar Problem

Bogdan Mitu bogdan_mt at yahoo.com
Tue Jun 4 08:03:38 PDT 2002


--- johnclarke72 <johnclarke at hotmail.com> wrote:
> I put this in the tag parser because I want to go on to write the 
> rules that will allow it to process HTML attributes (which may or may 
> not exist).  It seems that putting the description of what a whole 
> tag looks like in the parser is the best approach.
> 
> The main lexer does switch to the tag lexer when it sees <.  

The switch to the tag lexer will be done *after* the consumption of '<' by
the main lexer. The tag lexer will never see it, so the token INITSTARTTAG
will never be generated. Modify the startHTMLTag rule to start after '<':

startHTMLTag : tagName:TAGDATA
               {System.out.println("STARTTAG : "+tagName.getText());}
               FINISHSTARTTAG;
 
On the other hand, I'm not sure you really need a separate parser for tags
(although you probably need the embedded lexer). The parser doesn't have to
know about the lexer switches.

Bogdan

> How can I get this to work correctly ?
> 
> Thanks
> 
> John
> 
> --- In antlr-interest at y..., Bogdan Mitu <bogdan_mt at y...> wrote:
> > Hi,
> > 
> > Why rule startHTMLTag starts with INITSTARTTAG, while the others 
> not? 
> > It seems that you use embedded lexer and parser for HTML tags. You 
> probably
> > have in the main lexer a rule that recognize '<' and switches the 
> lexer. The
> > Tag Parser is connected to the second lexer, and will never receive 
> the
> > INITSTARTTAG token it is expecting in the rule startHTMLTag.
> > 
> > Try:
> > startHTMLTag : /* INITSTARTTAG removed */ tagName:TAGDATA
> >                 {System.out.println("STARTTAG : "+tagName.getText
> ());}
> >                 FINISHSTARTTAG;
> >  
> > Bogdan
> > 
> > 
> > --- johnclarke72 <johnclarke at h...> wrote:
> > > Hi,
> > >    I am currently having problems with a HTML Grammar that I am 
> > > writing.  The Grammar has been added to the end of this e-mail.
> > > 
> > > When I enter HTML comments (<!-- This is a Comment -->) and End 
> Tags 
> > > (</endTag>) it handles it correctly.
> > > 
> > > However,  if I enter <test> or anything similar to this I get 
> > > an "line 1: unexpected token: test" error message.  
> > > 
> > > How can I get this to work ?
> > > 
> > > I would be grateful for all advice offered regarding this.
> > > 
> > > John
> > > 
> > > HTMLTagLexer.g
> > > ==============
> > > 
> > > // Import the required Classes
> > > header
> > > {
> > >    import java.util.*;
> > >    import antlr.*;
> > > }
> > > 
> > > // Define the class
> > > class HTMLTagLexer extends Lexer;
> > > 
> > > // Set the options for the Lexer
> > > options
> > > {
> > >   k=3;                             // Set the look ahead to 3 
> > > Characters
> > >   caseSensitive = false;           // Set Case Sensitivity to 
> false
> > >   charVocabulary = '\1' .. '\377'; // Set the Lexer Character 
> > > Vocabulary
> > >   testLiterals = false;            // Don't test against the 
> Literals 
> > > table
> > >   exportVocab = HTMLTagLexer;      // The Grammar to export
> > > }
> > > 
> > > // The routines that will enable us to switch between lexer states
> > > {
> > >    // The Current Selector
> > >    TokenStreamSelector selector;
> > > 
> > >    // The method that will enable us to switch between lexer 
> states
> > >    public void setSelector(TokenStreamSelector 
> tokenStreamSelector)
> > >    {
> > >      selector = tokenStreamSelector;
> > >    }
> > > }
> > > 
> > > // Define the Tokens required for the Grammar
> > > 
> > > // Various HTML Marker Tags
> > > INITSTARTTAG   : "<";
> > > FINISHSTARTTAG : ">";
> > > EQUALS         : "=";
> > > 
> > > // HTML Comments
> > > HTMLCOMMENT : "!--"! (options {greedy=false;} : .)* "-->"!
> > >               {selector.pop();}
> > >               ;
> > > 
> > > // Main HTML Tags Section.  This defines the Tag names,
> > > // attributes and attribute values
> > > 
> > > // TAGDATA is used to define the Tag Name and names of
> > > // attributes used within the tag
> > > TAGDATA : (~(' ' | '\r' | '\n' | '\t' | '<' | '>' | '/' | '!' 
> | '='))
> > > +;
> > > 
> > > // TAGVALUE is used to define the values for attributes
> > > // used within the tags
> > > 
> > > 
> > > // Definition of an End Tag
> > > ENDTAG   : '/'! ( 'a'..'z' )+ ">"! {selector.pop();};
> > > 
> > > // Ignore all White Space
> > > WS : ( ' '
> > >      | '\t'
> > >      | '\r' '\n' { newline(); }
> > >      | '\n' { newline(); }
> > >      )
> > >      {$setType(Token.SKIP);} //ignore this token
> > > ;
> > > 
> > > HTMLTagParser.g
> > > ===============
> > > 
> > > // Define the class
> > > class HTMLTagParser extends Parser;
> > > 
> > > // Set the options for the Parser
> > > options
> > > {
> > >   importVocab = HTMLTagLexer;     // The Grammar to import
> > > }
> > > 
> > > 
> > > // The Parser Rules
> > > processHTML:
> > >    htmlComment:HTMLCOMMENT {System.out.println
> > > ("COMMENT : "+htmlComment.getText().trim());}
> > >    | startHTMLTag
> > >    | endTag:ENDTAG {System.out.println("ENDTAG : "+endTag.getText
> > > ());};
> > > 
> > > startHTMLTag : INITSTARTTAG tagName:TAGDATA
> > >                {System.out.println("STARTTAG : "+tagName.getText
> ());}
> > >                FINISHSTARTTAG;
> > > 
> > > 
> > > 
> > > 
> > >  
> > > 
> > > Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/ 
> > > 
> > > 
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! - Official partner of 2002 FIFA World Cup
> > http://fifaworldcup.yahoo.com
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
> 
> 
> 



__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list