[antlr-interest] Still having problems with the lexer code

Terence Parr parrt at jguru.com
Thu May 9 15:28:50 PDT 2002


Hi John...jumping in late, but it seems that if you are staying in the 
"lexer" world to do your parsing within, you should just call the rule 
that parses that grammar.  I designed the selector stream stuff for 
having an outside agent, like the parser, switch selectors.  Does the 
javadoc example help at all?

I'd suggest merging the lexers and have the grammar for stuff inside 
comment.  OH, also not that once <!-- is matched by HTMLCOMMENT in the 
first lexer, it cannot appear in the other lexer (already consumed).  
That is probably the source of your problem.

Ter

On Thursday, May 9, 2002, at 02:45  PM, johnclarke72 wrote:

> A number of people have offered me advice regarding this problem but
> so far I have not been able to solve it.
>
> When I compile and run the application I then enter <!-- test --> and
> expect to see :
> HTML Comment : <!-- test --> on the screen. But all I see is :
>
> line 1: unexpected token: <!-
> exception: antlr.TokenStreamRecognitionException: unexpected char: -
>
> I cannot see what is causing the problem. It is probably something
> very simple that I have missed out. I would be grateful for any
> advice offered.
>
> Best Wishes
>
> John
>
> The Grammar for the Text Lexer
> ==============================
>
> // Import the Required Classes
> header
> {
> import java.util.*;
> import antlr.*;
> }
>
> // The Class
> class TextLexer extends Lexer;
>
> // Set the Options for the Lexer
> options
> {
> k=3; // Set the Look Ahead to 3
> Characters
> charVocabulary = '\1' .. '\377'; // Set the Lexer Character
> Vocabulary
> testLiterals = false; // Don't test against the
> Literals table
> }
>
> // The routine that will allow us to switch between Selectors
> {
> // The current Selector
> TokenStreamSelector selector;
>
> // The method that will enable us to switch between Selectors
> public void setSelector(TokenStreamSelector tokenStreamSelector)
> {
> selector = tokenStreamSelector;
> }
>
> }
>
> HTMLCOMMENT : "<!-" {selector.select("HTMLTagLexer");};
>
> // TEXT
> WORD : ( ~ (' '|'\r'|'\n'|'\t'|'<') ) +;
>
> // Ignore all White Space
> WS : ( ' '
> | '\t'
> | '\r' '\n' { newline(); }
> | '\n' { newline(); }
> )
> {$setType(Token.SKIP);} //ignore this token
> ;
>
> The Grammar for the Tag Lexer
> =============================
> // Import the Required Classes
> header
> {
> import java.util.*;
> import antlr.*;
> }
>
> // The Class
> class HTMLTagLexer extends Lexer;
>
> // Set the Options for the Lexer
> options
> {
> k=3; // Set the Look Ahead to 3
> Characters
> charVocabulary = '\1' .. '\377'; // Set the Lexer Character
> Vocabulary
> testLiterals = false; // Don't test against the
> Literals table
> importVocab = Tagged; // The Vocabulary to import
> exportVocab = HTMLTags; // Export the Vocabulary to
> HTMLTags
> }
>
> // The routine that will allow us to switch between Selectors
> {
> // The current Selector
> TokenStreamSelector selector;
>
> // The method that will enable us to switch between Selectors
> public void setSelector(TokenStreamSelector tokenStreamSelector)
> {
> selector = tokenStreamSelector;
> }
>
> }
>
> // HTML Comment Definition
> HTMLCOMMENT : "<!--" (options { greedy=false; }: .) * "-->";
>
> // Ignore all White Space
> WS : ( ' '
> | '\t'
> | '\r' '\n' { newline(); }
> | '\n' { newline(); }
> )
> {$setType(Token.SKIP);} //ignore this token
> ;
>
> The Grammar for the Parser
> ==========================
>
> // Import the Required Classes
> header
> {
> import java.util.*;
> import antlr.*;
> }
>
> // The Class
> class HTMLParser extends Parser;
>
> // Set the Options for the Parser
> options
> {
> importVocab = Tagged; // The Vocabulary to import
> }
>
> // Define the starting point for processing the HTML
> processData :
> (
> text:WORD {System.out.println("TEXT " + text.getText());}
> | comment:HTMLComment {System.out.println("HTML Comment " +
> comment.getText());}
> )+;
>
> The Java Application
> ====================
>
> import java.io.*;
> import antlr.*;
>
> // The HTMLParserApp Class
> class HTMLParserApp
> {
>
> // The Main function
> public static void main(String[] args)
> {
> try
> {
> // Create the required Lexers
> HTMLTagLexer htmlTagLexer = new HTMLTagLexer(new
> DataInputStream(System.in));
> TextLexer textLexer = new TextLexer
> (htmlTagLexer.getInputState());
>
> // Create the TokenStreamSelector and add the required
> Lexers to it
> TokenStreamSelector tokenStreamSelector = new
> TokenStreamSelector();
> tokenStreamSelector.addInputStream
> (htmlTagLexer,"HTMLTagLexer");
> tokenStreamSelector.addInputStream(textLexer,"TextLexer");
>
> // Select the starting Lexer
> tokenStreamSelector.select("TextLexer");
>
> // Add the TokenStreamSelector to the Required Lexers
> htmlTagLexer.setSelector(tokenStreamSelector);
> textLexer.setSelector(tokenStreamSelector);
>
> // Create the HTML Parser
> HTMLParser htmlParser = new HTMLParser(tokenStreamSelector);
>
> // Process the HTML
> htmlParser.processData();
>
> } catch(Exception e)
> {
> System.err.println("exception: "+e);
> }
> }
> }
>
>
>
>
>
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/
>
>
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list