[antlr-interest] recognizing a function

Sat Jul 26 06:38:25 PDT 2008

Guy Kroizman schrieb:
> Thank you Ana Nelson and John B Brodie, I have learned much from your 
> assiduous responses.
> 
> My goal is to write a program that gets a valid Fortran code and output 
> the locations of the functions ( later subroutines and function calls too ).
> 
> I am still having a hard time figuring out how can I a grammar that will 
> only match a certain rule and ignore all other input.
> 
> Must I define a full Fortran grammar for that?

It depends on if you can get away with fuzzy recognition. If there is a 
way to recognize functions without having to look at other parts then 
you have just to skip the unneeded parts. There is a lexer option named 
"filter=true;" which ignores any input which isn't recognized (requires 
a lexer grammar).

In case operating on the lexer level is too low, you can define parser 
rules in such a way to skip any input you aren't interested in. ".*" 
(maybe with the greedy=false; option) along with a end marker like 
~FUNCTION let you skip lexer input. Otherwise you have to go the full 
grammar route.

Johannes
> 
> On Fri, Jul 25, 2008 at 1:28 AM, John B. Brodie <jbb at acm.org 
> <mailto:jbb at acm.org>> wrote:
> 
>     Greetings!
> 
>     Guy Kroizman wrote (in part):
>      >I have written a grammar that I hoped would find a function
>     definition in a
>      >Fortran file.
>      >Running it produces nothing. s-:
>      >
>      >I played with it a lot and debugged it with jdb and ANTLRWorks but
>     to avail.
>      >I wonder if anybody would be so kind to point me to the problem
>     with the
>      >grammar.
>      >
>      >grammar fun;
>      >
>      > root     :
>      >     (functionStatement)*
>      >     ;
> 
>     It is that pesky * on your start rule.
> 
>     You have said that a valid program (e.g. any parsable derivation
>     starting
>     from your root rule) may contain ZERO or more functionStatement's.
> 
>     So when you run your parser against the input you supplied in the
>     previous
>     message.  The parser sees the keyword - er I mean the NAME - PROGRAM
>     as the
>     first token it encounters.  PROGRAM is not a valid starting token
>     for the
>     functionStatement rule. So the parser just silently quits, without
>     parsing
>     anything because it found ZERO functionStatement's and you have said
>     that
>     is an okay thing.
> 
> 
>     Suggestions:
> 
>     1) I would suggest that you explicitly require an EOF token at the
>     end of
>       any valid input - this will immediately show problems like the one
>       discussed above.  So I would suggest that you change your root
>     rule to:
> 
>     root : ( functionStatement )* EOF ;
> 
>       running your parser with this version of the root rule should
>     produce a
>       syntax error - something similar to "found PROGRAM, expecting
>     FUNCTION"
> 
>     2) I would suggest not trying to deal with case insensitivity in your
>       lexer. Rather I would suggest using the case insensitive input file
>       stream posted to the antlr-interest mailing list back in december of
>       2006. ask about it again if you can't find it in the list's archives.
> 
>     3) I would not try to recognize keywords using a Parser rule - such
>     as your
>       type rule. Your type rule expects to see each individual letter of the
>       various keywords. However, ANTLR lexers are very greedy, they will
>       consume the longest possible sequence of characters that matches some
>       lexer rule. So your type rule will never see any individual letter
>       because all of the letters will be greedily gobbled up by the NAME
>       rule. Make the type rule be a lexer rule, and see the next
>     suggestion...
> 
>     4) You are going to experience a devil of a time trying to deal with
>       keywords that also may be identifiers.  I believe there are lots of
>       messages about this in the mailing list archives.
> 
>     Hope this helps.
>       -jbb
> 
>