[antlr-interest] recognizing a function

Sat Jul 26 16:06:35 PDT 2008

If you ignore the "( later subroutines and function calls too )", then the answer to your last I define a full Fortran grammar for that?" is no and Johannes describes the path to follow.  Once you need to look inside functions and subroutines, though, you pretty much need a full grammar.  The good news is that a full grammar is easier to "steal" and debug--there is plenty of code to run against a full parser.  Also, the full grammar enables capabilities that you have not yet thought of developing--it's amazing how often the more general grammar solution ends up being a common part of one's toolchest.

--Loring

----- Original Message ----
From: Guy Kroizman <kroizguy at gmail.com>
To: antlr-interest at antlr.org
Sent: Saturday, July 26, 2008 6:09:43 AM
Subject: Re: [antlr-interest] recognizing a function

Thank you Ana Nelson and John B Brodie, I have learned much from your assiduous responses.

My goal is to write a program that gets a valid Fortran code and output the locations of the functions ( later subroutines and function calls too ).

I am still having a hard time figuring out how can I a grammar that will only match a certain rule and ignore all other input.

Must I define a full Fortran grammar for that?

On Fri, Jul 25, 2008 at 1:28 AM, John B. Brodie <jbb at acm.org> wrote:

Greetings!

Guy Kroizman wrote (in part):

>I have written a grammar that I hoped would find a function definition in a
>Fortran file.
>Running it produces nothing. s-:
>
>I played with it a lot and debugged it with jdb and ANTLRWorks but to avail.
>I wonder if anybody would be so kind to point me to the problem with the
>grammar.
>
>grammar fun;
>
> root     :
>     (functionStatement)*
>     ;

It is that pesky * on your start rule.

You have said that a valid program (e.g. any parsable derivation starting
from your root rule) may contain ZERO or more functionStatement's.

So when you run your parser against the input you supplied in the previous
message.  The parser sees the keyword - er I mean the NAME - PROGRAM as the
first token it encounters.  PROGRAM is not a valid starting token for the
functionStatement rule. So the parser just silently quits, without parsing
anything because it found ZERO functionStatement's and you have said that
is an okay thing.

Suggestions:

1) I would suggest that you explicitly require an EOF token at the end of
  any valid input - this will immediately show problems like the one
  discussed above.  So I would suggest that you change your root rule to:

root : ( functionStatement )* EOF ;

  running your parser with this version of the root rule should produce a
  syntax error - something similar to "found PROGRAM, expecting FUNCTION"

2) I would suggest not trying to deal with case insensitivity in your
  lexer. Rather I would suggest using the case insensitive input file
  stream posted to the antlr-interest mailing list back in december of
  2006. ask about it again if you can't find it in the list's archives.

3) I would not try to recognize keywords using a Parser rule - such as your
  type rule. Your type rule expects to see each individual letter of the
  various keywords. However, ANTLR lexers are very greedy, they will
  consume the longest possible sequence of characters that matches some
  lexer rule. So your type rule will never see any individual letter
  because all of the letters will be greedily gobbled up by the NAME
  rule. Make the type rule be a lexer rule, and see the next suggestion...

4) You are going to experience a devil of a time trying to deal with
  keywords that also may be identifiers.  I believe there are lots of
  messages about this in the mailing list archives.

Hope this helps.
  -jbb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080726/acedc506/attachment.html