[antlr-interest] Debugging: how? (Why do I get MismatchedTokenException or UnwantedTokenException?) Unhelpful error messages.

Jim Idle jimi at temporal-wave.com
Thu Oct 30 09:16:53 PDT 2008


On Thu, 2008-10-30 at 15:28 +0100, Hendrik Maryns wrote:

> John B. Brodie schreef:
> > Greetings!
> > 
> > Hendrik Maryns asked:
> > 
> >> I showed you my grammar yesterday.  Now trying it out on some simple
> >> inputs blows me away right away: it doesn’t even parse anything.
> 
> > Your problem seems to be with your Lexer rule for LABEL which is :
> > 
> > LABEL : ~(')')+ ;
> > 
> > this means that any sequence of characters that is not a ')' must be a
> > LABEL.
> 
> I am starting to understand the difference between lexer and parser now.
>  I was thinking of it as some sort of regular expression parser, but
> since the lexer does not know anything about the parser, it doesn’t care
> about it.
> 
> > another problem is that ')' is not matched by any Lexer rule. did you
> > want OPEN and CLOSE to be parens?
> 
> Yes, sorry, a relict of debugging.
> 
> >> Grateful for any suggestions,
> > 
> >>.....remainder of message snipped....
> > 
> > Hope this helps
> 
> It did, in that I know what is wrong, but I still have no solution to my
> problem: how can I make the variable in my label rule be anything?  That
> is, I would think anything except whitespace and braces and control
> characters would be fine.  In particular, it definitely has to accept
> any word in any script, along with some punctuation characters such as .
> - _ $ and probably more.


There are a couple of solutions, but you don't say what the lexical
significance of your labels are, or whether this is a language you are
inventing (in which case don't do that), or one you are following a spec
for. 

In general, such labels tend to be valid in certain places only, such as
the start of a line/statement, only following goto and so on. If this is
the case, then you use a semantic predicate to check if you are at the
first character position in a line, then consume everything up to
whitespace and return LABEL. After goto and gosub, then consume the
label spec within the definitions of such keywords, make the text of the
token be the label, and extract the label from the token in the parser.
You just have to think creatively about the trigger points that indicate
a label is/could be, next. 

What language is this? This knowledge may help people help you.

If there are no lexical points that trigger a label interpretation, then
the next best thing is to construct a parser rule that accumulates label
components:

label : WORD ( { checkNoSpace() }?=> labelstuff )* ;

labelstuff
       : WORD | DOT | UNDERSCORE | BANG | keywords ... ;

Then build the text of the label from the text of the individual tokens
and rewrite as a LABEL for the AST.

Can't be any more specific without knowing what you are trying to parse.
You usually have to look for specific solutions for your DSL when you
get in to this stuff as usually it means the language design was weak in
the first place.

Jim


> 
> H.
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081030/600d408c/attachment.html 


More information about the antlr-interest mailing list