[antlr-interest] Lexer error
Brian Catlin
BrianC at sannas.org
Wed Apr 14 05:17:46 PDT 2010
Thank you very much! That solved it.
This begs the question: Why didn't ANTLR report this? Seems like a bug to
me, but then I haven't gotten to that part of the book yet (I have it as a
Kindle eBook, which has neither page numbers or any sort of search
capability), and perhaps it is described there.
Again, many thanks!
-Brian
-----Original Message-----
From: Mark Wright [mailto:markwright at internode.on.net]
Sent: Wednesday, April 14, 2010 18:09
To: Brian Catlin
Cc: 'Cliff Hudson'; antlr-interest at antlr.org
Subject: Re: Re: [antlr-interest] Lexer error
On Wed, Apr 14, 2010 at 04:48:51PM +0800, Brian Catlin wrote:
> Placing the Fragment attribute on FILE_NAME was just the last in a
> long series of desperate attempts to try and get it to work. I too,
> am surprised that ANTLR didn't at least warn about it.
>
>
>
> Thanks for the advice about memoization and backtracking.
>
>
>
> I modified FILE_NAME to add the quotes, as you suggested, but that
> didn't
> help:
>
>
>
> FILE_NAME
>
> : '"' ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n' | '"')+ '"';
>
>
>
> Do you have any recommendations on examples that use semantic
> predicates in a way that is similar to what I'm trying to do?
Yes, p. 287 section Keyords as Variables of The Definitive ANTLR Reference.
Regards, Mark
> Thanks!
>
> -Brian
>
>
>
> From: Cliff Hudson [mailto:cliff.s.hudson at gmail.com]
> Sent: Wednesday, April 14, 2010 16:19
> To: BrianC at sannas.org
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer error
>
>
>
> FILE_NAME is a fragment, so it will never match as a token without
> another token referring to it.. Rule a_file thus can never match (and
> in fact it seems like you should get an error about that.)
>
>
>
> You will have a more general problem that FILE_NAME can also match any
> of your keywords, and likewise your keywords can match any filename
> that has the same text, which means certain filenames will not produce
> the expected tokens in your grammar. Tokens without wildcards match
> in the order they are declared, but tokens with wildcards can consume
> input before preceding tokens that don't have wildcards which could also
match the same input.
>
>
>
> There are a couple of ways around this:
>
> 1. Teach your lexer more about the input using semantic predicates -
> these allow you to switch token rules on an off depending on conditions
you set.
>
> 2. Ensure your tokens are lexically unambiguous - for instance
> FILE_NAME could be surrounded by quotation marks which none of your other
tokens use.
> This option is probably more desirable, since file names can also
> contain whitespace, and depending on how your grammar turns out, this
> would allow you to continue to match tokens after the file name.
>
> One note - ANTLR does not perform case-insensitive tokenization.
> You've probably already come across this, but I just wanted to make
> sure you knew before you hit that too.
>
>
>
> Finally, be sure to turn off backtracking and memoization periodically
> to see if your grammar will function without them. These do incur
> performance/memory penalties, and most grammars can be written without
> invoking these features.
>
>
>
> On Wed, Apr 14, 2010 at 12:57 AM, Brian Catlin <BrianC at sannas.org> wrote:
>
> The following grammar compiles without any sort of warnings or errors,
> and ANTLRworks doesn't complain either, but when I call the parser, it
> returns a warning for each character in the string to be parsed. I
> know it has something to do with the FILE_NAME rule, but I don't know
> how to fix it. I suspect that the lexer cannot create a token because
> the FILE_NAME rule could also match any other token (a file name on
> Windows can contain just about any character). I've structured my
> grammar so that the FILE_NAME is always the last token on a line, so I
> figured ANTLR would be able to figure it out from that context, but
> that doesn't appear to be the case. So, how can I describe it to ANTLR?
>
>
>
> Any help would be greatly appreciated!
>
>
>
> -Brian
>
>
>
>
>
> DT> dump mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 0, near 'D' :
>
> dump mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 1, near 'U' :
>
> ump mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 2, near 'M' :
>
> mp mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 3, near 'P' :
>
> p mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 5, near 'M' :
>
> mbr
>
> -memory-(1) : lexer error 3 :
>
> at offset 6, near 'B' :
>
> br
>
> -memory-(1) : lexer error 3 :
>
> at offset 7, near 'R' :
>
> r
>
>
>
> //
>
> // This grammar defines the commands available to the DiskTool (DT)
> program
>
> //
>
>
>
> grammar Commands;
>
>
>
> options
>
> {
>
> output = AST;
>
> ASTLabelType = pANTLR3_BASE_TREE;
>
> language = C;
>
> backtrack = true;
>
> memoize = true;
>
> }
>
>
>
> @lexer::header
>
> {
>
> #define ANTLR3_INLINE_INPUT_ASCII
>
> }
>
>
>
> //+
>
> // Productions
>
> //-
>
>
>
> commands
>
> :
>
> (script_command
>
> | dump_command
>
> | show_command
>
> )*;
>
>
>
> script_command
>
> : '@'
>
> FILE_NAME
>
> ;
>
>
>
> dump_command
>
> : DUMP
>
> ( dump_struct
>
> | dump_block
>
> | a_file
>
> );
>
>
>
> show_command
>
> : SHOW
>
> ( structure_nouns
>
> | storage_nouns
>
> | a_file
>
> );
>
>
>
> mbr_vbr
>
> : MBR
>
> | VBR
>
> ;
>
>
>
> block_nouns
>
> : LBN
>
> | LCN
>
> | VBN
>
> | VCN
>
> ;
>
>
>
> structure_nouns
>
> : MBR
>
> | VBR
>
> ;
>
>
>
> dump_block
>
>
>
> : block_nouns
>
> number
>
> (
>
> (',' number
>
> )
>
> |
>
> (':' number
>
> ))?
>
> DRIVE_NAME?
>
> ;
>
>
>
> dump_struct
>
> : mbr_vbr
>
> ('/' qualifier)?
>
> DRIVE_NAME?
>
> ;
>
>
>
> storage_nouns
>
> : DISK
>
> | VOLUME
>
> ;
>
>
>
> a_file
>
> : FILE
>
> FILE_NAME
>
> ;
>
>
>
> number
>
> : DEC_NUMBER
>
> | HEX_NUMBER
>
> ;
>
>
>
> qualifier
>
> : ALL
>
> | CODE
>
> | TABLE
>
> ;
>
>
>
> //+
>
> // Tokens
>
> //-
>
>
>
> // Verbs
>
>
>
> DUMP : 'DUMP';
>
> SHOW : 'SHOW';
>
>
>
> // Nouns
>
>
>
> DISK : 'DISK';
>
> FILE : 'FILE';
>
> LBN : 'LBN';
>
> LCN : 'LCN';
>
> MBR : 'MBR';
>
> PBN : 'PBN';
>
> VBN : 'VBN';
>
> VBR : 'VBR';
>
> VCN : 'VCN';
>
> VOLUME : 'VOLUME';
>
>
>
> // Qualifiers
>
>
>
> ALL : 'ALL';
>
> CODE : 'CODE';
>
> TABLE : 'TABLE';
>
>
>
> // Miscellaneous tokens
>
>
>
> DRIVE_NAME
>
> : LETTER ':';
>
>
>
> fragment
>
> LETTER : 'A'..'Z';
>
>
>
> fragment
>
> DIGIT : '0'..'9';
>
>
>
> fragment
>
> HEX_DIGIT : (DIGIT | 'A'..'F');
>
>
>
> HEX_NUMBER : '0X' HEX_DIGIT+;
>
>
>
> DEC_NUMBER : DIGIT+;
>
>
>
> fragment
>
> FILE_NAME
>
> : ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n')+ (('\r'? '\n') |
> EOF);
>
>
>
> LINE_COMMENT
>
> : '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;};
>
>
>
> WS : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
>
>
>
>
>
>
>
> #include <windows.h>
>
> #include <stdio.h>
>
>
>
> #include "CommandsLexer.h" //
> Generated by ANTLR from Commands.g
>
> #include "CommandsParser.h" //
> Generated by ANTLR from Commands.g
>
>
>
>
>
>
>
> void main (int Argc, char* Argv[])
>
> {
>
> DWORD status;
>
> char* ptr;
>
> char command [1024];
>
> DWORD command_len;
>
> pANTLR3_INPUT_STREAM input;
>
> pANTLR3_COMMON_TOKEN_STREAM tstream;
>
> pCommandsLexer lexer;
>
> pCommandsParser parser;
>
> CommandsParser_commands_return commands_ast;
>
> pANTLR3_COMMON_TREE_NODE_STREAM nodes;
>
> //pCommandsDumpDecl tree_parser;
>
>
>
>
>
> //+
>
> // Display our prompt and read a command string from the console
>
> //-
>
>
>
> while (TRUE)
>
> {
>
> printf ("DT> ");
>
>
>
> //+
>
> // Read the entire line
>
> //-
>
>
>
> if ((ptr = gets_s ((char *)command, sizeof (command))) !=
> NULL)
>
> {
>
> command_len = strlen ((char*)command);
>
>
>
> //+
>
> // Only try to parse the input if there is something
> there
>
> //-
>
>
>
> if (command_len > 0)
>
> {
>
>
>
> //+
>
> // Create the input stream
>
> //-
>
>
>
> if ((input = antlr3NewAsciiStringInPlaceStream
> ((pANTLR3_UINT8)&command, (ANTLR3_UINT64) command_len, NULL)) != 0)
>
> {
>
>
>
> //+
>
> // Tell ANTLR to use upper-case when
> matching tokens
>
> //-
>
>
>
> input->setUcaseLA (input, ANTLR3_TRUE);
>
>
>
> //+
>
> // Create a new instance of the lexer
> using our input stream
>
> //-
>
>
>
> if ((lexer = CommandsLexerNew (input)) !=
> 0)
>
> {
>
>
>
> //+
>
> // Create the token stream
>
> //-
>
>
>
> if ((tstream =
> antlr3CommonTokenStreamSourceNew (ANTLR3_SIZE_HINT,
> TOKENSOURCE(lexer))) !=
> 0)
>
> {
>
>
>
> //+
>
> // Create a new instance of
> the parser using our lexer
>
> //-
>
>
>
> if ((parser =
> CommandsParserNew
> (tstream)) != 0)
>
> {
>
>
>
> //+
>
> // Call the parser with
> the start symbol
>
> //-
>
>
>
> commands_ast =
> parser->commands (parser);
>
>
>
> //+
>
> // Check for errors
> parsing the input
>
> //-
>
>
>
> if
> (parser->pParser->rec->state->errorCount == 0)
>
> {
>
>
>
> //+
>
> // The input was
> parsed successfully. Use the Abstract Syntax Tree
>
> // which contains
> a linked list of nodes containing the tokens that
>
> // were parsed
>
> //-
>
>
>
> nodes =
> antlr3CommonTreeNodeStreamNewTree (commands_ast.tree,
> ANTLR3_SIZE_HINT);
>
> printf ("Commands
> tree: %s\n", commands_ast.tree->toStringTree
> (commands_ast.tree)->chars);
>
> // tree_parser =
> CommandsDumpDeclNew (nodes);
>
>
>
> // tree_parser->decl
> (tree_parser);
>
> // nodes->free (nodes);
>
> // tree_parser->free
> (tree_parser);
>
> }
>
> else
>
> {
>
> printf ("Errors
> found during parsing: %d\n", parser->pParser->rec->state->errorCount);
>
> }
>
>
>
> //+
>
> // We're now done with
> these instances, so free them
>
> //-
>
>
>
> parser->free (parser);
>
> tstream->free
> (tstream);
>
> lexer->free (lexer);
>
> input->close (input);
>
> }
>
> else
>
> {
>
> status = GetLastError
> ();
>
> printf ("Error creating
> parser, status = %08x\n", status);
>
> break;
>
> }
>
>
>
> }
>
> else
>
> {
>
> status = GetLastError ();
>
> printf ("Unable to create
> token stream, status = %08x\n", status);
>
> break;
>
> }
>
>
>
> }
>
> else
>
> {
>
> status = GetLastError ();
>
> printf ("Unable to create lexer,
> status = %08x\n", status);
>
> break;
>
> }
>
>
>
> }
>
> else
>
> {
>
> status = GetLastError ();
>
> printf ("Error creating the input stream,
> status = %08x\n", status);
>
> break;
>
> }
>
>
>
> }
>
>
>
> }
>
>
>
>
>
> } // End while
>
>
>
> }
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list