[antlr-interest] Stopping parser and lexer at first error

Fri Apr 2 09:40:35 PDT 2010

'Wrong' might be too strong.

However, you should not raise errors for that sort of thing in the lexer 'in general'. 

Obviously the lexer should raise an error when it sees there is something not correct, such as a character that cannot be allowed. However, it raises a controlled error that you program and record, then does not pass the character on. You can then parse and see if there are more errors you can give out, then, if possible check semantics. Basically try to give out as much error information in once pass as possible (however you also need to know the types of errors that will create many other spurious errors and stop processing when you see them.)

Your example below does not make sense to me because if:

MyClass is a correct keyword
But MyCLASS is not

Then the lexer rule will not recognize myCLASS as a keyword. If there is an ID rule then it will see it as that. If MyCLASS can mean nothing at all, then there should be a rule that matches it in the lexer, but issues a controlled error and deletes it. Your compiler will not go any farther than it can/should because you will record the error, but you won't just stop at the first thing that goes wrong lexically. There is nothing worse than a compiler that says "invalid character line 1, offset 34" and stops, so you fix that character, run it again and it says "invalid character line 1, offset 36".

So, I am advising you that if you program your lexer rules anticipating anything that can go wrong, then you won't throw any ANTLR exceptions that don't tell you anything about the error. You should still catch these and raise 'Internal compiler error' (because your lexer is incorrect) of course, but not rely on this as the way to catch bad input.

So:

STRING : '"' ~('"'|'\n'|'\r')*
          (
               '"'  // Hunky dory
             |  { raise(UNTERMINATED_STRING); } // Missing delimiter
          )
        ;

....

ID : ('a'..'z'|morestuff)+ { checkId($text); } ;

ANY : . { raise(ILLEGAL_CHARACTER); }

All of the above will result in a set of tokens that you can still parse, and hopefully check semantically, but you won't generate code and so on because you recorded the fact that there were errors.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Corrado Campisano
> Sent: Friday, April 02, 2010 8:48 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Stopping parser and lexer at first error
> 
> Hi,
> 
> I think this could apply to lexel-level errors due to unexpected chars,
> but
> not to unexpected char-sequences.
> 
> I mean (it's not the case of my grammar, but could happen), what if I
> want
> to distinguish tokens like those:
>  - MyClass
>  - MYCONSTANT
>  - myVariable
> and consider the following ones as errors:
>  - MyCLass
>  - MYCONstant
>  - myVAriable
> 
> ??
> 
> Is the "you should" from some best-practice?
> 
> I believe the lexer should rise exceptions due to errors in the
> 'lexical
> analisys' and the parser for the 'syntactic analisys', am I wrong?
> 
> 
> [image: http://wiki.codeblocks.org/images/a/a9/Parser_Flow.gif]
> 
> 
> Regards,
> Corrado
> 
> 
> 2010/4/2 Jim Idle <jimi at temporal-wave.com>
> 
> > You should program your lexer such that it does not throw any errors.
> > Program for the common mistakes (such as un-terminated "string) and
> have a
> > catch all rule for unknown characters.
> >
> > Jim
> >
> >
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Corrado Campisano
> > > Sent: Friday, April 02, 2010 7:59 AM
> > > To: antlr-interest at antlr.org
> > > Subject: Re: [antlr-interest] Stopping parser and lexer at first
> error
> > >
> > > Hi all,
> > >
> > > I setup an ANTLR-maven archetype with a grammar providing the
> override
> > > for
> > > the 'always resume' behaviour.
> > >
> > > You can find details on my
> > >
> website<http://www.servicemix.eu/index.php?option=com_content&view=arti
> > > cle&id=14>,
> > > maybe it's worth checking it and adding a notice on this archetype
> to
> > > this
> > > ANTLR wiki
> > >
> page<http://www.antlr.org/wiki/display/ANTLR3/Building+ANTLR+Projects+w
> > > ith+Maven>and/or
> > > to the ANTLR
> > > v3 Maven plugin page <http://www.antlr.org/antlr3-maven-
> > > plugin/index.html>.
> > >
> > >
> > > Regards,
> > > Corrado.
> > >
> > >
> > > 2010/3/10 Corrado Campisano <corrado.campisano at gmail.com>
> > >
> > > > Hi all,
> > > >
> > > > I needed to catch any syntax error (letting the lexer
> insert/delete
> > > chars
> > > > or the parser keeping parsing with the sys.err message only could
> be
> > > very
> > > > dangerous to my application), so I took a look on the reference
> > > (which
> > > > reports information not valid anymore) and on the internet and I
> > > found
> > > > several hints and articles:
> > > >
> > > > Why the generated parser code tolerates illegal
> > >
> expression?<http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554
> > > 943>
> > > > How can I make the lexer exit upon first lexical
> > >
> error?<http://www.antlr.org/wiki/pages/viewpage.action?pageId=5341217>
> > > >
> http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery
> > > > [antlr-interest] I want to throw an exception and stop parse,
> please!
> > > > <http://www.antlr.org/pipermail/antlr-interest/2009-
> May/034605.html>
> > > >
> > > > It looks to me I found a way to do this, maybe it's worth to
> publish
> > > that
> > > > on the wiki, once validated.
> > > >
> > > >
> > > > I just added the following overrides to my grammar (attached):
> > > >
> > > > @parser::members
> > > > {
> > > >     public class ParserException extends RuntimeException {
> > > >             Object objCurrentInputSymbol = null;
> > > >
> > > >             public ParserException(Object oCurrentInputSymbol) {
> > > >                 this.objCurrentInputSymbol = oCurrentInputSymbol;
> > > >             }
> > > >         }
> > > >
> > > >         protected Object recoverFromMismatchedToken(IntStream
> input,
> > > int
> > > > ttype, BitSet follow) throws RecognitionException {
> > > >             System.out.println("PARSER :
> > > > this.getCurrentInputSymbol(input).toString() : " +
> > > > this.getCurrentInputSymbol(input).toString());
> > > >             System.out.println("PARSER : this.failed() : " +
> > > > this.failed());
> > > >             System.out.println("PARSER :
> > > this.getNumberOfSyntaxErrors() : "
> > > > + this.getNumberOfSyntaxErrors());
> > > >             throw new
> > > ParserException(this.getCurrentInputSymbol(input));
> > > >         }
> > > > }
> > > >
> > > > @lexer::members
> > > > {
> > > >     public class LexerException extends RuntimeException {
> > > >             RecognitionException recognitionException = null;
> > > >             String strErrorHeader = null;
> > > >             String strErrorMessage = null;
> > > >
> > > >             public LexerException(RecognitionException recExc,
> String
> > > > sHead, String sMsg) {
> > > >                 this.recognitionException = recExc;
> > > >                 this.strErrorHeader = sHead;
> > > >                 this.strErrorMessage = sMsg;
> > > >
> > > >                 System.out.println("LEXER : ErrorHeader : " +
> sHead);
> > > >                 System.out.println("LEXER : ErrorMessage : " +
> sMsg);
> > > >                 System.out.println("LEXER : RecognitionException
> : "
> > > +
> > > > this.recognitionException.toString());
> > > >             }
> > > >         }
> > > >
> > > >
> > > >         public void reportError(RecognitionException recExc) {
> > > >         throw new LexerException(recExc,
> this.getErrorHeader(recExc),
> > > > getErrorMessage(recExc, this.getTokenNames()));
> > > >     }
> > > > }
> > > >
> > > >
> > > > Then I tested it with a simple class:
> > > >     public static void main(String[] args) {
> > > >         testLexerError();
> > > >         testParserError();
> > > >     }
> > > >     private static void testLexerError() {
> > > >         String strDlToParse = "{CORRADO PIPPO ;feee}";
> > > >         System.out.println("TESTING LEXER with : " +
> strDlToParse);
> > > >         testError(strDlToParse);
> > > >     }
> > > >     private static void testParserError() {
> > > >         String strDlToParse = "{CORRADO PIPPO feee} dhert";
> > > >         System.out.println("TESTING PARSER with : " +
> strDlToParse);
> > > >         testError(strDlToParse);
> > > >     }
> > > >     private static void testError(String strDlToParse) {
> > > >         CommonTree tree=null;
> > > >         String strError = null;
> > > >
> > > >         ANTLRStringStream input = new
> > > > org.antlr.runtime.ANTLRStringStream(strDlToParse);
> > > >         Dl2OwlJavaBLexer lexer = new Dl2OwlJavaBLexer(input);
> > > >         TokenStream tokens = new
> > > > org.antlr.runtime.CommonTokenStream(lexer);
> > > >         Dl2OwlJavaBParser parser = new Dl2OwlJavaBParser(tokens);
> > > >
> > > >         try {
> > > >             // this may rise an exception
> > > >             // TODO : check why NO EXCEPTION is risen with error
> > > "line 1:9
> > > > no viable alternative at character ';'" on inputs like "{CORRADO
> ;}"
> > > >             eu.servicemix.dl2owl.Dl2OwlJavaBParser.axiom_return
> ret =
> > > > parser.axiom();
> > > >
> > > >             // TODO : check if this will be executed if no
> exception
> > > rises
> > > >             tree = (CommonTree) ret.getTree();
> > > >
> > > >             printTreeHelper(tree);
> > > >
> > > >         } catch (RecognitionException e) {
> > > >
> > > >             System.out.println(e.toString());
> > > >             e.printStackTrace();
> > > >
> > > >         } catch (RuntimeException e) {
> > > >
> > > >             System.out.println(e.toString());
> > > >             e.printStackTrace();
> > > >         }
> > > >     }
> > > >
> > > >
> > > > The output looks ok, I wonder whether the whole 'trick' is too...
> > > >
> > > > TESTING LEXER with : {CORRADO PIPPO *;*feee}
> > > > LEXER : ErrorHeader : line 1:15
> > > > LEXER : ErrorMessage : no viable alternative at character ';'
> > > > LEXER : RecognitionException : NoViableAltException(';'@[1:1:
> Tokens
> > > : (
> > > > T__37 | T__38 | T__39 | T__40 | HAS_VALUE | ALL_VALUES |
> SOME_VALUES
> > > | DOT |
> > > > HAS_CARD | MIN_CARD | MAX_CARD | NOT | AND | OR | URI_REF |
> INT_VALUE
> > > | WS |
> > > > CTRL_CHAR );])
> > > > eu.servicemix.dl2owl.Dl2OwlJavaBLexer$LexerException
> > > > eu.servicemix.dl2owl.Dl2OwlJavaBLexer$LexerException
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.Dl2OwlJavaBLexer.reportError(Dl2OwlJavaBLexer.java
> > > :69)
> > > >     at org.antlr.runtime.Lexer.nextToken(Lexer.java:94)
> > > >     at
> > > >
> > >
> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:1
> > > 19)
> > > >     at
> > >
> org.antlr.runtime.CommonTokenStream.LT<http://org.antlr.runtime.commont
> > > okenstream.lt/>
> > > > (CommonTokenStream.java:238)
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.Dl2OwlJavaBParser.axiom(Dl2OwlJavaBParser.java:110
> > > )
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.CommonTreeHelper.testError(CommonTreeHelper.java:1
> > > 40)
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.CommonTreeHelper.testLexerError(CommonTreeHelper.j
> > > ava:121)
> > > >     at
> > > >
> eu.servicemix.dl2owl.CommonTreeHelper.main(CommonTreeHelper.java:113)
> > > >
> > > > TESTING PARSER with : {CORRADO PIPPO feee} *dhert*
> > > > PARSER : this.getCurrentInputSymbol(input).toString() :
> > > > [@8,21:25='dhert',<7>,1:21]
> > > > PARSER : this.failed() : false
> > > > PARSER : this.getNumberOfSyntaxErrors() : 0
> > > > eu.servicemix.dl2owl.Dl2OwlJavaBParser$ParserException
> > > > eu.servicemix.dl2owl.Dl2OwlJavaBParser$ParserException
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.Dl2OwlJavaBParser.recoverFromMismatchedToken(Dl2Ow
> > > lJavaBParser.java:97)
> > > >     at
> > > org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.Dl2OwlJavaBParser.axiom(Dl2OwlJavaBParser.java:232
> > > )
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.CommonTreeHelper.testError(CommonTreeHelper.java:1
> > > 40)
> > > >     at
> > > >
> > >
> eu.servicemix.dl2owl.CommonTreeHelper.testParserError(CommonTreeHelper.
> > > java:126)
> > > >     at
> > > >
> eu.servicemix.dl2owl.CommonTreeHelper.main(CommonTreeHelper.java:114)
> > > >
> > > >
> > > > Any comment really appreciated!!
> > > >
> > > > Corrado
> > > >
> > > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address