[antlr-interest] How to abort lexer when invalid token encountered?

Karl Goldstein karlgold at yahoo.com
Thu Mar 6 15:12:52 PST 2008


I'm writing a parser for a simple SQL-like query language.  If the lexer encounters an invalid token (say for an unbalanced quote), I want to abort parsing immediately (no recovery) and return the character position of the invalid token  to the client.
  Looking at the generated code for my parser and lexer, it appears that the default behavior when parsing a query is:
  1) parser calls LT(1) on the token stream.
2) token stream fills the buffer with tokens from the lexer by calling lexer.nextToken
3) Lexer.nextToken catches all RecognitionExceptions and recovers from them
  What is the best way to override this behavior, and just bubble up the RecognitionException to the caller of the parser?  I got something to work by overriding reportError and throwing a runtime version of RecognitionException, but I have to believe there's a cleaner way to do this.  My grammar is below.
  Any other comments on my grammar more than welcome, this is my first time using ANTLR.
  Thanks,
  Karl
  -----
  grammar Cql;
  options {
  language=Java;
  output=AST;
  ASTLabelType=CommonTree;
}
  tokens {
  ROOT;
}
  @header {
package com.-----.query;
}
  @lexer::header {
package com.-----.query;
}
  @members {
  protected void mismatch(IntStream input, int ttype, BitSet follow)
    throws RecognitionException {
    throw new MismatchedTokenException(ttype, input);
  }
  
  public void recoverFromMismatchedSet(IntStream input,
    RecognitionException e, BitSet follow)
    throws RecognitionException {
    throw e;
  }
}
  @rulecatch {
  catch (RecognitionException e) {
    throw e;
  }
}
  @lexer::members {
  public void reportError(RecognitionException e) {
    throw new RecognitionRuntimeException(e);
  }
}
  query: 
  'select' fieldlist 'from' TABLE ('where' criteria)? ->
  ^(ROOT ^('select' fieldlist) ^('from' TABLE) ^('where' criteria)?);
  fieldlist: FIELD (',' FIELD)* -> FIELD+;
  // placeholder for set of supported tables
TABLE: 'table' ;
 
BOOLEAN: ('and' | 'or');
 
criteria: criterion (BOOLEAN^ criterion)*;
  OPERATOR: ('=' | '>' | '<');
  criterion: 
  FIELD OPERATOR operand -> ^(OPERATOR FIELD operand) 
  |  '(' criteria ')' -> ^(ROOT criteria);
 
operand : (FIELD | STRING_LITERAL | NUMBER | DATE);
  FIELD: ('a'..'z' | 'A'..'Z')+;
  fragment
DIGIT: '0'..'9';
  DATE: DIGIT DIGIT? ('/' | '-') DIGIT DIGIT (DIGIT DIGIT)? ;
  NUMBER: DIGIT+ | DIGIT+ ('.' DIGIT*) | '.' DIGIT+;
  fragment
QUOTED_CHARACTER: 
  ( ~( '\'' | '\\' ) ) | '\\' ( ( '\'' | '\\' ) );
  STRING_LITERAL: 
  '\''! ( QUOTED_CHARACTER )* '\''!;
  WS: (' ')+ { skip(); } ;


       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080306/bdffa791/attachment.html 


More information about the antlr-interest mailing list