[antlr-interest] Lexer error

Wed Apr 14 05:17:46 PDT 2010

Thank you very much!  That solved it.  

This begs the question: Why didn't ANTLR report this?  Seems like a bug to
me, but then I haven't gotten to that part of the book yet (I have it as a
Kindle eBook, which has neither page numbers or any sort of search
capability), and perhaps it is described there.

Again, many thanks!

 -Brian

-----Original Message-----
From: Mark Wright [mailto:markwright at internode.on.net] 
Sent: Wednesday, April 14, 2010 18:09
To: Brian Catlin
Cc: 'Cliff Hudson'; antlr-interest at antlr.org
Subject: Re: Re: [antlr-interest] Lexer error

On Wed, Apr 14, 2010 at 04:48:51PM +0800, Brian Catlin wrote:
> Placing the Fragment attribute on FILE_NAME was just the last in a 
> long series of desperate attempts to try and get it to work.  I too, 
> am surprised that ANTLR didn't at least warn about it.
> 
>  
> 
> Thanks for the advice about memoization and backtracking.
> 
>  
> 
> I modified FILE_NAME to add the quotes, as you suggested, but that 
> didn't
> help:
> 
>  
> 
> FILE_NAME
> 
>       :  '"' ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n' | '"')+ '"';
> 
>  
> 
> Do you have any recommendations on examples that use semantic 
> predicates in a way that is similar to what I'm trying to do?

Yes, p. 287 section Keyords as Variables of The Definitive ANTLR Reference.

Regards, Mark

> Thanks!
> 
> -Brian
> 
>  
> 
> From: Cliff Hudson [mailto:cliff.s.hudson at gmail.com]
> Sent: Wednesday, April 14, 2010 16:19
> To: BrianC at sannas.org
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer error
> 
>  
> 
> FILE_NAME is a fragment, so it will never match as a token without 
> another token referring to it..  Rule a_file thus can never match (and 
> in fact it seems like you should get an error about that.)
> 
>  
> 
> You will have a more general problem that FILE_NAME can also match any 
> of your keywords, and likewise your keywords can match any filename 
> that has the same text, which means certain filenames will not produce 
> the expected tokens in your grammar.  Tokens without wildcards match 
> in the order they are declared, but tokens with wildcards can consume 
> input before preceding tokens that don't have wildcards which could also
match the same input.
> 
>  
> 
> There are a couple of ways around this:
> 
> 1. Teach your lexer more about the input using semantic predicates - 
> these allow you to switch token rules on an off depending on conditions
you set.
> 
> 2. Ensure your tokens are lexically unambiguous - for instance 
> FILE_NAME could be surrounded by quotation marks which none of your other
tokens use.
> This option is probably more desirable, since file names can also 
> contain whitespace, and depending on how your grammar turns out, this 
> would allow you to continue to match tokens after the file name.
> 
> One note - ANTLR does not perform case-insensitive tokenization.  
> You've probably already come across this, but I just wanted to make 
> sure you knew before you hit that too.
> 
>  
> 
> Finally, be sure to turn off backtracking and memoization periodically 
> to see if your grammar will function without them.  These do incur 
> performance/memory penalties, and most grammars can be written without 
> invoking these features.
> 
>  
> 
> On Wed, Apr 14, 2010 at 12:57 AM, Brian Catlin <BrianC at sannas.org> wrote:
> 
> The following grammar compiles without any sort of warnings or errors, 
> and ANTLRworks doesn't complain either, but when I call the parser, it 
> returns a warning for each character in the string to be parsed.  I 
> know it has something to do with the FILE_NAME rule, but I don't know 
> how to fix it.  I suspect that the lexer cannot create a token because 
> the FILE_NAME rule could also match any other token (a file name on 
> Windows can contain just about any character).  I've structured my 
> grammar so that the FILE_NAME is always the last token on a line, so I 
> figured ANTLR would be able to figure it out from that context, but 
> that doesn't appear to be the case.  So, how can I describe it to ANTLR?
> 
> 
> 
> Any help would be greatly appreciated!
> 
> 
> 
> -Brian
> 
> 
> 
> 
> 
> DT> dump mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 0, near 'D' :
> 
>        dump mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 1, near 'U' :
> 
>        ump mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 2, near 'M' :
> 
>        mp mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 3, near 'P' :
> 
>        p mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 5, near 'M' :
> 
>        mbr
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 6, near 'B' :
> 
>        br
> 
> -memory-(1) : lexer error 3 :
> 
>         at offset 7, near 'R' :
> 
>        r
> 
> 
> 
> //
> 
> // This grammar defines the commands available to the DiskTool (DT) 
> program
> 
> //
> 
> 
> 
> grammar Commands;
> 
> 
> 
> options
> 
>      {
> 
>      output = AST;
> 
>      ASTLabelType = pANTLR3_BASE_TREE;
> 
>      language = C;
> 
>      backtrack = true;
> 
>      memoize = true;
> 
>      }
> 
> 
> 
> @lexer::header
> 
> {
> 
> #define     ANTLR3_INLINE_INPUT_ASCII
> 
> }
> 
> 
> 
> //+
> 
> // Productions
> 
> //-
> 
> 
> 
> commands
> 
>      :
> 
>      (script_command
> 
>      | dump_command
> 
>      | show_command
> 
>      )*;
> 
> 
> 
> script_command
> 
>      :  '@'
> 
>      FILE_NAME
> 
>      ;
> 
> 
> 
> dump_command
> 
>      : DUMP
> 
>      ( dump_struct
> 
>      | dump_block
> 
>      | a_file
> 
>      );
> 
> 
> 
> show_command
> 
>      : SHOW
> 
>      ( structure_nouns
> 
>      | storage_nouns
> 
>      | a_file
> 
>      );
> 
> 
> 
> mbr_vbr
> 
>      : MBR
> 
>      | VBR
> 
>      ;
> 
> 
> 
> block_nouns
> 
>      : LBN
> 
>      | LCN
> 
>      | VBN
> 
>      | VCN
> 
>      ;
> 
> 
> 
> structure_nouns
> 
>      : MBR
> 
>      | VBR
> 
>      ;
> 
> 
> 
> dump_block
> 
> 
> 
>      : block_nouns
> 
>      number
> 
>      (
> 
>      (',' number
> 
>      )
> 
>      |
> 
>      (':' number
> 
>      ))?
> 
>      DRIVE_NAME?
> 
>      ;
> 
> 
> 
> dump_struct
> 
>      : mbr_vbr
> 
>      ('/' qualifier)?
> 
>      DRIVE_NAME?
> 
>      ;
> 
> 
> 
> storage_nouns
> 
>      : DISK
> 
>      | VOLUME
> 
>      ;
> 
> 
> 
> a_file
> 
>      : FILE
> 
>      FILE_NAME
> 
>      ;
> 
> 
> 
> number
> 
>      : DEC_NUMBER
> 
>      | HEX_NUMBER
> 
>      ;
> 
> 
> 
> qualifier
> 
>      : ALL
> 
>      | CODE
> 
>      | TABLE
> 
>      ;
> 
> 
> 
> //+
> 
> // Tokens
> 
> //-
> 
> 
> 
> // Verbs
> 
> 
> 
> DUMP        : 'DUMP';
> 
> SHOW        : 'SHOW';
> 
> 
> 
> // Nouns
> 
> 
> 
> DISK        : 'DISK';
> 
> FILE        : 'FILE';
> 
> LBN         : 'LBN';
> 
> LCN         : 'LCN';
> 
> MBR         : 'MBR';
> 
> PBN         : 'PBN';
> 
> VBN         : 'VBN';
> 
> VBR         : 'VBR';
> 
> VCN         : 'VCN';
> 
> VOLUME      : 'VOLUME';
> 
> 
> 
> // Qualifiers
> 
> 
> 
> ALL         : 'ALL';
> 
> CODE        : 'CODE';
> 
> TABLE       : 'TABLE';
> 
> 
> 
> // Miscellaneous tokens
> 
> 
> 
> DRIVE_NAME
> 
>      : LETTER ':';
> 
> 
> 
> fragment
> 
> LETTER      : 'A'..'Z';
> 
> 
> 
> fragment
> 
> DIGIT : '0'..'9';
> 
> 
> 
> fragment
> 
> HEX_DIGIT   : (DIGIT | 'A'..'F');
> 
> 
> 
> HEX_NUMBER  : '0X' HEX_DIGIT+;
> 
> 
> 
> DEC_NUMBER  : DIGIT+;
> 
> 
> 
> fragment
> 
> FILE_NAME
> 
>      :  ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n')+ (('\r'? '\n') | 
> EOF);
> 
> 
> 
> LINE_COMMENT
> 
>      : '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;};
> 
> 
> 
> WS    : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
> 
> 
> 
> 
> 
> 
> 
> #include <windows.h>
> 
> #include <stdio.h>
> 
> 
> 
> #include "CommandsLexer.h"                                              //
> Generated by ANTLR from Commands.g
> 
> #include "CommandsParser.h"                                             //
> Generated by ANTLR from Commands.g
> 
> 
> 
> 
> 
> 
> 
> void main (int Argc, char* Argv[])
> 
> {
> 
> DWORD                                     status;
> 
> char*                                     ptr;
> 
> char                                      command [1024];
> 
> DWORD                                     command_len;
> 
> pANTLR3_INPUT_STREAM                input;
> 
> pANTLR3_COMMON_TOKEN_STREAM         tstream;
> 
> pCommandsLexer                            lexer;
> 
> pCommandsParser                           parser;
> 
> CommandsParser_commands_return      commands_ast;
> 
> pANTLR3_COMMON_TREE_NODE_STREAM     nodes;
> 
> //pCommandsDumpDecl                       tree_parser;
> 
> 
> 
> 
> 
>      //+
> 
>      // Display our prompt and read a command string from the console
> 
>      //-
> 
> 
> 
>      while (TRUE)
> 
>            {
> 
>            printf ("DT> ");
> 
> 
> 
>            //+
> 
>            // Read the entire line
> 
>            //-
> 
> 
> 
>            if ((ptr = gets_s ((char *)command, sizeof (command))) != 
> NULL)
> 
>                  {
> 
>                  command_len = strlen ((char*)command);
> 
> 
> 
>                  //+
> 
>                  // Only try to parse the input if there is something 
> there
> 
>                  //-
> 
> 
> 
>                  if (command_len > 0)
> 
>                        {
> 
> 
> 
>                        //+
> 
>                        // Create the input stream
> 
>                        //-
> 
> 
> 
>                        if ((input = antlr3NewAsciiStringInPlaceStream 
> ((pANTLR3_UINT8)&command, (ANTLR3_UINT64) command_len, NULL)) != 0)
> 
>                              {
> 
> 
> 
>                              //+
> 
>                              // Tell ANTLR to use upper-case when 
> matching tokens
> 
>                              //-
> 
> 
> 
>                              input->setUcaseLA (input, ANTLR3_TRUE);
> 
> 
> 
>                              //+
> 
>                              // Create a new instance of the lexer 
> using our input stream
> 
>                              //-
> 
> 
> 
>                              if ((lexer = CommandsLexerNew (input)) != 
> 0)
> 
>                                    {
> 
> 
> 
>                                    //+
> 
>                                    // Create the token stream
> 
>                                    //-
> 
> 
> 
>                                    if ((tstream = 
> antlr3CommonTokenStreamSourceNew (ANTLR3_SIZE_HINT, 
> TOKENSOURCE(lexer))) !=
> 0)
> 
>                                          {
> 
> 
> 
>                                          //+
> 
>                                          // Create a new instance of 
> the parser using our lexer
> 
>                                          //-
> 
> 
> 
>                                          if ((parser = 
> CommandsParserNew
> (tstream)) != 0)
> 
>                                                {
> 
> 
> 
>                                                //+
> 
>                                                // Call the parser with 
> the start symbol
> 
>                                                //-
> 
> 
> 
>                                                commands_ast =
> parser->commands (parser);
> 
> 
> 
>                                                //+
> 
>                                                // Check for errors 
> parsing the input
> 
>                                                //-
> 
> 
> 
>                                                if 
> (parser->pParser->rec->state->errorCount == 0)
> 
>                                                      {
> 
> 
> 
>                                                      //+
> 
>                                                      // The input was 
> parsed successfully.  Use the Abstract Syntax Tree
> 
>                                                      // which contains 
> a linked list of nodes containing the tokens that
> 
>                                                      // were parsed
> 
>                                                      //-
> 
> 
> 
>                                                      nodes = 
> antlr3CommonTreeNodeStreamNewTree (commands_ast.tree, 
> ANTLR3_SIZE_HINT);
> 
>                                                      printf ("Commands
> tree: %s\n", commands_ast.tree->toStringTree 
> (commands_ast.tree)->chars);
> 
> //                                                    tree_parser =
> CommandsDumpDeclNew (nodes);
> 
> 
> 
> //                                                    tree_parser->decl
> (tree_parser);
> 
> //                                                    nodes->free (nodes);
> 
> //                                                    tree_parser->free
> (tree_parser);
> 
>                                                      }
> 
>                                                else
> 
>                                                      {
> 
>                                                      printf ("Errors 
> found during parsing: %d\n", parser->pParser->rec->state->errorCount);
> 
>                                                      }
> 
> 
> 
>                                                //+
> 
>                                                // We're now done with 
> these instances, so free them
> 
>                                                //-
> 
> 
> 
>                                                parser->free (parser);
> 
>                                                tstream->free 
> (tstream);
> 
>                                                lexer->free (lexer);
> 
>                                                input->close (input);
> 
>                                                }
> 
>                                          else
> 
>                                                {
> 
>                                                status = GetLastError 
> ();
> 
>                                                printf ("Error creating 
> parser, status = %08x\n", status);
> 
>                                                break;
> 
>                                                }
> 
> 
> 
>                                          }
> 
>                                    else
> 
>                                          {
> 
>                                          status = GetLastError ();
> 
>                                          printf ("Unable to create 
> token stream, status = %08x\n", status);
> 
>                                          break;
> 
>                                          }
> 
> 
> 
>                                    }
> 
>                              else
> 
>                                    {
> 
>                                    status = GetLastError ();
> 
>                                    printf ("Unable to create lexer, 
> status = %08x\n", status);
> 
>                                    break;
> 
>                                    }
> 
> 
> 
>                              }
> 
>                        else
> 
>                              {
> 
>                              status = GetLastError ();
> 
>                              printf ("Error creating the input stream, 
> status = %08x\n", status);
> 
>                              break;
> 
>                              }
> 
> 
> 
>                        }
> 
> 
> 
>                  }
> 
> 
> 
> 
> 
>            }     // End while
> 
> 
> 
> }
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
>  
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>