[antlr-interest] Re: how to force unexpected token error

hawkwall hawkwall at yahoo.com
Thu Nov 13 08:29:12 PST 2003


Hi Ric,

Thanks alot for your answer.  Your insight into ONETEN is right, in
could appear again as a value.  I will check it in the parser and
raise an exception.  I also hadn't thought of overriding the
testLiterals method.  I tried removing the IDENTIFER rule and adding
rules for 
THREAT_CLASSES : "THREAT.CLASSES.";
SURFACE_TO_AIR : "NUMBER.OF.SURFACE.TO.AIR.THREAT.CLASSES";
but I get a unrecognized character error line 1 column 1.

I appreciate the help and now understand antlr a little better.  Why
doesn't antlr come with the api documentation?  I probably should have
run javadoc along time ago.

Thanks again

Mike

--- In antlr-interest at yahoogroups.com, Ric Klaren <klaren at c...> wrote:
> Hi,
> 
> On Sun, Nov 09, 2003 at 03:01:33AM -0000, hawkwall wrote:
> > Input:
> > THREAT.CLASSES.110
> > NUMBER.OF.SURFACE.TO.AIR.THREAT.CLASSES:  3
> > end of Input:
> >
> > Parser:
> > startSACLASS : (rules)+ ;
> >
> > rules : threatclass
> >         | sathreatclass
> >         ;
> >
> > threatclass: THREAT_CLASSES ONETEN;
> 
> Better use NUMBER in stead of ONETEN and then check here if it's
"110". If
> not so throw a SemanticException or whatever Exception you deem more
> apropriate (RecognitionException?).
> 
> > sathreatclass: SURFACE_TO_AIR COLON  NUMBER
> > 	{System.out.println("Got Here");}
> > 	;
> > end of Parser:
> >
> > Lexer:
> > options {
> > 	k=5; // character lookahead
> 
> Not really necessary to have this much lookahead. (Ok antlr optimizes
> excess checks away but with bigger stuff it makes running antlr slower)
> 
> > 	testLiterals=false;
> > }
> >
> > tokens
> > {
> > 	THREAT_CLASSES="THREAT.CLASSES.";
> > 	SURFACE_TO_AIR="NUMBER.OF.SURFACE.TO.AIR.THREAT.CLASSES";
> > }
> >
> > IDENTIFIER options { testLiterals=true;} : (LETTER | '.')+;
> > NUMBER : (DIGIT)+;
> > DOT : '.' ;
> > COLON : ':';
> > ONETEN : ("110") => "110" ; //predicate is an attempt to remove
> > nondetermism with NUMBER, but didn't work
> 
> I'd remove the ONETEN rule better deal with it in the parser... At least
> it's kinda ugly like this ;) Also the rule might interfere with
other valid
> uses of 110 as a number. E.g. like this you have to deal in the
parser in
> all spots where you have a NUMBER token with an extra alternative
ONETEN.
> 
> e.g. the choice between one NUMBER rule (and no ONETEN) and in the
parser
> in a few spots a check on 110. Or a NUMBER and a ONETEN rule and in the
> parser for all NUMBER occurences (NUMBER|ONETEN) if NUMBER is common
in the
> rest of the grammar the choice is obvious.
> 
> > private DIGIT : ('0'..'9') ;
> > private LETTER : ('A'..'Z');
> >
> ...
> > end of Lexer:
> >
> > I need the parser to catch it if the input is mispelled.
> > The parser complains if I change the first line to
> > THREAT.CASSES.110 or THREAT.CLASSES.112
> >
> > It doesn't fail when I correct the first line and change the second
> > line to something like
> > NUMBER.OF.USRFACE.TO.AIR.THREAT.CLASSES: 3
> >
> > I turned on the trace, and with the incorrect input on the second
> > line, it matches IDENTIFIER and
> > then finished normally.  The action is never executed.  What is the
> > difference?
> 
> Because there's no EOF check it just came to the conclusion that the
input
> upto now was valid and it could exit (at least that's my guess).
IDENTIFIER
> is a valid token in your lexer but your parser does not process it as a
> result it matches any misspelled keyword and the parser does not require
> any more tokens so it just stops if it received some valid input. Having
> EOF at the end of the start rule is very good practice in general
(although
> in some rarer cases you don't want it)
> 
> > Why is unexpected token given
> > in the first case but not the other.  I tried setting
> > defaultErrorHandler=false, but it didn't fix my problem.
> 
> defaultErrorhandler only controls wether a exception falls through
to the
> caller of the parser or if it gets caught in the rule throwing it. Just
> look at the differences in generated code.
> 
> > I tried
> > putting EOF at the end of my start rule, but to no avail.  I tried to
> > factor
> > out the THREAT.CLASSES from the end of SURFACE_TO_AIR, also removing
> > the final dot from the THREAT_CLASSES token.
> > I changed the threatclass rule to :
> > THREAT_CLASSES DOT ONETEN
> > and then got a
> > line 1:1: unexpected token: THREAT.CLASSES.
> > error.
> >
> > I see why it is happening in the parser.  Here is the relevent java:
> > public final void startSACLASS() throws RecognitionException,
> > TokenStreamException {
> >
> > 		try {      // for error handling
> ....
> > The problem is the if ( _cnt1421>=1).  If I remove that and make the
> > code look like this:
> 
> The >= 1 is from the ()+ in the rule.
> 
> > 				if ((LA(1)==THREAT_CLASSES||LA(1)==SURFACE_TO_AIR)) {
> > 					rules();
> > 				}
> > 				else {
> > 					throw new NoViableAltException(LT(1), getFilename());
> > 				}
> 
> What happens if you change the start rule to:
> 
> startSACLASS : (rules)* EOF ;
> 
> Or maybe even better remove these from the tokens section:
> 
> > 	THREAT_CLASSES="THREAT.CLASSES.";
> > 	SURFACE_TO_AIR="NUMBER.OF.SURFACE.TO.AIR.THREAT.CLASSES";
> 
> And delete the IDENTIFIER rule. This is the one that makes all
tokens with
> letters and dots in them valid. Unless you catch invalid identifiers
in the
> parser (depends on the rest of your grammar looking at the _cnt1421 I
> suspect your grammar is bigger in reality than this snippet)
> 
> and add rules:
> 
> THREAT_CLASSES : "THREAT.CLASSES.";
> SURFACE_TO_AIR : "NUMBER.OF.SURFACE.TO.AIR.THREAT.CLASSES";
> 
> Then anything not matching these and the other lexer rules will bomb
out.
> Another option is to have the IDENTIFIER rule and use some extra
checks on
> invalid IDENTIFIER checks in the parser or maybe overload the literals
> testing method of the lexer and have it bomb out (throw an
exception) if no
> literal is matched. Again this depends on your complete grammar either
> solution has its advantages and its drawbacks.
> 
> Just for kicks make little executable around the lexer that calls
nextToken
> on it and dumps it to stdout. Then look at the tokens returned by
the lexer
> it should give you more of a feel what your parser sees and which errors
> are generated by the parser and which ones by the lexer.
> 
> Hope this helps,
> 
> Ric
> --
>
-----+++++*****************************************************+++++++++-------
>     ---- Ric Klaren ----- j.klaren at u... ----- +31 53 4893722  ----
>
-----+++++*****************************************************+++++++++-------
>   Quidquid latine dictum sit, altum viditur.
>                  (Whatever is said in Latin sounds profound.)


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list