[antlr-interest] Re: A question regarding Token Stream Multi plexing (aka "Lexer states")
johnclarke72
johnclarke at hotmail.com
Mon May 6 10:50:50 PDT 2002
How can I modify it so that when the Text Lexer detects the <!--
start of a HTML comment how can I pass the whole comment (<!-- test --
>) through to the Tag Lexer ?
The white space rule is put in as a precaution.
Thanks
John
--- In antlr-interest at y..., mzukowski at y... wrote:
> In your main lexer you match "<!-" and then switch to
HTMLTagLexer. So what
> is left to match is "- test -->". It then tries to match that
first '-'
> against either the HTMLCOMMENT or WS rules but neither work, so it
complains
> that it doesn't know what to do. HTMLCOMMENT should probably be
something
> like
>
> HTMLCOMMENT : (options { greedy=false; }: .) * "-->" {selector.pop
();};
>
> And why do you have a WS rule in the HTMLTagLexer? Do you need it
there?
>
> Monty
>
> > -----Original Message-----
> > From: johnclarke72 [mailto:johnclarke at h...]
> > Sent: Monday, May 06, 2002 7:15 AM
> > To: antlr-interest at y...
> > Subject: [antlr-interest] Re: A question regarding Token Stream
> > Multiplexing (aka "Lexer states")
> >
> >
> > When I compile and run the application I then enter <!-- test -->
and
> > expect to see :
> > HTML Comment : <!-- test --> on the screen. But all I see is :
> >
> > line 1: unexpected token: <!-
> > exception: antlr.TokenStreamRecognitionException: unexpected
char: -
> >
> > I cannot see what is causing the problem. It is probably
something
> > very simple that I have missed out. I would be grateful for any
> > advice offered.
> >
> > Best Wishes
> >
> > John
> >
> > The Grammar for the Text Lexer
> > ==============================
> >
> > // Import the Required Classes
> > header
> > {
> > import java.util.*;
> > import antlr.*;
> > }
> >
> > // The Class
> > class TextLexer extends Lexer;
> >
> > // Set the Options for the Lexer
> > options
> > {
> > k=3; // Set the Look Ahead to
3
> > Characters
> > charVocabulary = '\1' .. '\377'; // Set the Lexer
Character
> > Vocabulary
> > testLiterals = false; // Don't test against the
> > Literals table
> > }
> >
> > // The routine that will allow us to switch between Selectors
> > {
> > // The current Selector
> > TokenStreamSelector selector;
> >
> > // The method that will enable us to switch between Selectors
> > public void setSelector(TokenStreamSelector
tokenStreamSelector)
> > {
> > selector = tokenStreamSelector;
> > }
> >
> > }
> >
> > HTMLCOMMENT : "<!-" {selector.select("HTMLTagLexer");};
> >
> > // TEXT
> > WORD : ( ~ (' '|'\r'|'\n'|'\t'|'<') ) +;
> >
> > // Ignore all White Space
> > WS : ( ' '
> > | '\t'
> > | '\r' '\n' { newline(); }
> > | '\n' { newline(); }
> > )
> > {$setType(Token.SKIP);} //ignore this token
> > ;
> >
> > The Grammar for the Tag Lexer
> > =============================
> > // Import the Required Classes
> > header
> > {
> > import java.util.*;
> > import antlr.*;
> > }
> >
> > // The Class
> > class HTMLTagLexer extends Lexer;
> >
> > // Set the Options for the Lexer
> > options
> > {
> > k=3; // Set the Look Ahead to
3
> > Characters
> > charVocabulary = '\1' .. '\377'; // Set the Lexer
Character
> > Vocabulary
> > testLiterals = false; // Don't test against the
> > Literals table
> > importVocab = Tagged; // The Vocabulary to
import
> > exportVocab = HTMLTags; // Export the Vocabulary
to
> > HTMLTags
> > }
> >
> > // The routine that will allow us to switch between Selectors
> > {
> > // The current Selector
> > TokenStreamSelector selector;
> >
> > // The method that will enable us to switch between Selectors
> > public void setSelector(TokenStreamSelector
tokenStreamSelector)
> > {
> > selector = tokenStreamSelector;
> > }
> >
> > }
> >
> > // HTML Comment Definition
> > HTMLCOMMENT : "<!--" (options { greedy=false; }: .) * "-->";
> >
> > // Ignore all White Space
> > WS : ( ' '
> > | '\t'
> > | '\r' '\n' { newline(); }
> > | '\n' { newline(); }
> > )
> > {$setType(Token.SKIP);} //ignore this token
> > ;
> >
> > The Grammar for the Parser
> > ==========================
> >
> > // Import the Required Classes
> > header
> > {
> > import java.util.*;
> > import antlr.*;
> > }
> >
> > // The Class
> > class HTMLParser extends Parser;
> >
> > // Set the Options for the Parser
> > options
> > {
> > importVocab = Tagged; // The Vocabulary to
import
> > }
> >
> > // Define the starting point for processing the HTML
> > processData :
> > (
> > text:WORD {System.out.println("TEXT " + text.getText());}
> > | comment:HTMLComment {System.out.println("HTML Comment " +
> > comment.getText());}
> > )+;
> >
> > The Java Application
> > ====================
> >
> > import java.io.*;
> > import antlr.*;
> >
> > // The HTMLParserApp Class
> > class HTMLParserApp
> > {
> >
> > // The Main function
> > public static void main(String[] args)
> > {
> > try
> > {
> > // Create the required Lexers
> > HTMLTagLexer htmlTagLexer = new HTMLTagLexer(new
> > DataInputStream(System.in));
> > TextLexer textLexer = new TextLexer
> > (htmlTagLexer.getInputState());
> >
> > // Create the TokenStreamSelector and add the required
> > Lexers to it
> > TokenStreamSelector tokenStreamSelector = new
> > TokenStreamSelector();
> > tokenStreamSelector.addInputStream
> > (htmlTagLexer,"HTMLTagLexer");
> > tokenStreamSelector.addInputStream
(textLexer,"TextLexer");
> >
> > // Select the starting Lexer
> > tokenStreamSelector.select("TextLexer");
> >
> > // Add the TokenStreamSelector to the Required Lexers
> > htmlTagLexer.setSelector(tokenStreamSelector);
> > textLexer.setSelector(tokenStreamSelector);
> >
> > // Create the HTML Parser
> > HTMLParser htmlParser = new HTMLParser
(tokenStreamSelector);
> >
> > // Process the HTML
> > htmlParser.processData();
> >
> > } catch(Exception e)
> > {
> > System.err.println("exception: "+e);
> > }
> > }
> > }
> >
> >
> >
> >
> >
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/
> >
> >
> >
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list