[antlr-interest] Re: Grammar Problem
johnclarke72
johnclarke at hotmail.com
Tue Jun 4 13:53:41 PDT 2002
Thanks for your help.
John
--- In antlr-interest at y..., Bogdan Mitu <bogdan_mt at y...> wrote:
> --- johnclarke72 <johnclarke at h...> wrote:
> > I put this in the tag parser because I want to go on to write the
> > rules that will allow it to process HTML attributes (which may or
may
> > not exist). It seems that putting the description of what a
whole
> > tag looks like in the parser is the best approach.
> >
> > The main lexer does switch to the tag lexer when it sees <.
>
> The switch to the tag lexer will be done *after* the consumption
of '<' by
> the main lexer. The tag lexer will never see it, so the token
INITSTARTTAG
> will never be generated. Modify the startHTMLTag rule to start
after '<':
>
> startHTMLTag : tagName:TAGDATA
> {System.out.println("STARTTAG : "+tagName.getText
());}
> FINISHSTARTTAG;
>
> On the other hand, I'm not sure you really need a separate parser
for tags
> (although you probably need the embedded lexer). The parser doesn't
have to
> know about the lexer switches.
>
> Bogdan
>
> > How can I get this to work correctly ?
> >
> > Thanks
> >
> > John
> >
> > --- In antlr-interest at y..., Bogdan Mitu <bogdan_mt at y...> wrote:
> > > Hi,
> > >
> > > Why rule startHTMLTag starts with INITSTARTTAG, while the
others
> > not?
> > > It seems that you use embedded lexer and parser for HTML tags.
You
> > probably
> > > have in the main lexer a rule that recognize '<' and switches
the
> > lexer. The
> > > Tag Parser is connected to the second lexer, and will never
receive
> > the
> > > INITSTARTTAG token it is expecting in the rule startHTMLTag.
> > >
> > > Try:
> > > startHTMLTag : /* INITSTARTTAG removed */ tagName:TAGDATA
> > > {System.out.println
("STARTTAG : "+tagName.getText
> > ());}
> > > FINISHSTARTTAG;
> > >
> > > Bogdan
> > >
> > >
> > > --- johnclarke72 <johnclarke at h...> wrote:
> > > > Hi,
> > > > I am currently having problems with a HTML Grammar that I
am
> > > > writing. The Grammar has been added to the end of this e-
mail.
> > > >
> > > > When I enter HTML comments (<!-- This is a Comment -->) and
End
> > Tags
> > > > (</endTag>) it handles it correctly.
> > > >
> > > > However, if I enter <test> or anything similar to this I get
> > > > an "line 1: unexpected token: test" error message.
> > > >
> > > > How can I get this to work ?
> > > >
> > > > I would be grateful for all advice offered regarding this.
> > > >
> > > > John
> > > >
> > > > HTMLTagLexer.g
> > > > ==============
> > > >
> > > > // Import the required Classes
> > > > header
> > > > {
> > > > import java.util.*;
> > > > import antlr.*;
> > > > }
> > > >
> > > > // Define the class
> > > > class HTMLTagLexer extends Lexer;
> > > >
> > > > // Set the options for the Lexer
> > > > options
> > > > {
> > > > k=3; // Set the look ahead to 3
> > > > Characters
> > > > caseSensitive = false; // Set Case Sensitivity to
> > false
> > > > charVocabulary = '\1' .. '\377'; // Set the Lexer Character
> > > > Vocabulary
> > > > testLiterals = false; // Don't test against the
> > Literals
> > > > table
> > > > exportVocab = HTMLTagLexer; // The Grammar to export
> > > > }
> > > >
> > > > // The routines that will enable us to switch between lexer
states
> > > > {
> > > > // The Current Selector
> > > > TokenStreamSelector selector;
> > > >
> > > > // The method that will enable us to switch between lexer
> > states
> > > > public void setSelector(TokenStreamSelector
> > tokenStreamSelector)
> > > > {
> > > > selector = tokenStreamSelector;
> > > > }
> > > > }
> > > >
> > > > // Define the Tokens required for the Grammar
> > > >
> > > > // Various HTML Marker Tags
> > > > INITSTARTTAG : "<";
> > > > FINISHSTARTTAG : ">";
> > > > EQUALS : "=";
> > > >
> > > > // HTML Comments
> > > > HTMLCOMMENT : "!--"! (options {greedy=false;} : .)* "-->"!
> > > > {selector.pop();}
> > > > ;
> > > >
> > > > // Main HTML Tags Section. This defines the Tag names,
> > > > // attributes and attribute values
> > > >
> > > > // TAGDATA is used to define the Tag Name and names of
> > > > // attributes used within the tag
> > > > TAGDATA : (~(' ' | '\r' | '\n' | '\t' | '<' | '>' | '/' | '!'
> > | '='))
> > > > +;
> > > >
> > > > // TAGVALUE is used to define the values for attributes
> > > > // used within the tags
> > > >
> > > >
> > > > // Definition of an End Tag
> > > > ENDTAG : '/'! ( 'a'..'z' )+ ">"! {selector.pop();};
> > > >
> > > > // Ignore all White Space
> > > > WS : ( ' '
> > > > | '\t'
> > > > | '\r' '\n' { newline(); }
> > > > | '\n' { newline(); }
> > > > )
> > > > {$setType(Token.SKIP);} //ignore this token
> > > > ;
> > > >
> > > > HTMLTagParser.g
> > > > ===============
> > > >
> > > > // Define the class
> > > > class HTMLTagParser extends Parser;
> > > >
> > > > // Set the options for the Parser
> > > > options
> > > > {
> > > > importVocab = HTMLTagLexer; // The Grammar to import
> > > > }
> > > >
> > > >
> > > > // The Parser Rules
> > > > processHTML:
> > > > htmlComment:HTMLCOMMENT {System.out.println
> > > > ("COMMENT : "+htmlComment.getText().trim());}
> > > > | startHTMLTag
> > > > | endTag:ENDTAG {System.out.println
("ENDTAG : "+endTag.getText
> > > > ());};
> > > >
> > > > startHTMLTag : INITSTARTTAG tagName:TAGDATA
> > > > {System.out.println
("STARTTAG : "+tagName.getText
> > ());}
> > > > FINISHSTARTTAG;
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/
> > > >
> > > >
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Yahoo! - Official partner of 2002 FIFA World Cup
> > > http://fifaworldcup.yahoo.com
> >
> >
> >
> >
> > Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
> >
> >
> >
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list