[antlr-interest] recognizing a function

John B. Brodie jbb at acm.org
Thu Jul 24 15:28:26 PDT 2008


Greetings!

Guy Kroizman wrote (in part):
>I have written a grammar that I hoped would find a function definition in a
>Fortran file.
>Running it produces nothing. s-:
>
>I played with it a lot and debugged it with jdb and ANTLRWorks but to avail.
>I wonder if anybody would be so kind to point me to the problem with the
>grammar.
>
>grammar fun;
>
> root     :
>     (functionStatement)*
>     ;

It is that pesky * on your start rule.

You have said that a valid program (e.g. any parsable derivation starting
from your root rule) may contain ZERO or more functionStatement's.

So when you run your parser against the input you supplied in the previous
message.  The parser sees the keyword - er I mean the NAME - PROGRAM as the
first token it encounters.  PROGRAM is not a valid starting token for the
functionStatement rule. So the parser just silently quits, without parsing
anything because it found ZERO functionStatement's and you have said that
is an okay thing.


Suggestions:

1) I would suggest that you explicitly require an EOF token at the end of
   any valid input - this will immediately show problems like the one
   discussed above.  So I would suggest that you change your root rule to:

root : ( functionStatement )* EOF ;

   running your parser with this version of the root rule should produce a
   syntax error - something similar to "found PROGRAM, expecting FUNCTION"

2) I would suggest not trying to deal with case insensitivity in your
   lexer. Rather I would suggest using the case insensitive input file
   stream posted to the antlr-interest mailing list back in december of
   2006. ask about it again if you can't find it in the list's archives.

3) I would not try to recognize keywords using a Parser rule - such as your
   type rule. Your type rule expects to see each individual letter of the
   various keywords. However, ANTLR lexers are very greedy, they will
   consume the longest possible sequence of characters that matches some
   lexer rule. So your type rule will never see any individual letter
   because all of the letters will be greedily gobbled up by the NAME
   rule. Make the type rule be a lexer rule, and see the next suggestion...

4) You are going to experience a devil of a time trying to deal with
   keywords that also may be identifiers.  I believe there are lots of
   messages about this in the mailing list archives.

Hope this helps.
   -jbb


More information about the antlr-interest mailing list