[antlr-interest] recognizing a function
Johannes Luber
jaluber at gmx.de
Sat Jul 26 06:38:25 PDT 2008
Guy Kroizman schrieb:
> Thank you Ana Nelson and John B Brodie, I have learned much from your
> assiduous responses.
>
> My goal is to write a program that gets a valid Fortran code and output
> the locations of the functions ( later subroutines and function calls too ).
>
> I am still having a hard time figuring out how can I a grammar that will
> only match a certain rule and ignore all other input.
>
> Must I define a full Fortran grammar for that?
It depends on if you can get away with fuzzy recognition. If there is a
way to recognize functions without having to look at other parts then
you have just to skip the unneeded parts. There is a lexer option named
"filter=true;" which ignores any input which isn't recognized (requires
a lexer grammar).
In case operating on the lexer level is too low, you can define parser
rules in such a way to skip any input you aren't interested in. ".*"
(maybe with the greedy=false; option) along with a end marker like
~FUNCTION let you skip lexer input. Otherwise you have to go the full
grammar route.
Johannes
>
> On Fri, Jul 25, 2008 at 1:28 AM, John B. Brodie <jbb at acm.org
> <mailto:jbb at acm.org>> wrote:
>
> Greetings!
>
> Guy Kroizman wrote (in part):
> >I have written a grammar that I hoped would find a function
> definition in a
> >Fortran file.
> >Running it produces nothing. s-:
> >
> >I played with it a lot and debugged it with jdb and ANTLRWorks but
> to avail.
> >I wonder if anybody would be so kind to point me to the problem
> with the
> >grammar.
> >
> >grammar fun;
> >
> > root :
> > (functionStatement)*
> > ;
>
> It is that pesky * on your start rule.
>
> You have said that a valid program (e.g. any parsable derivation
> starting
> from your root rule) may contain ZERO or more functionStatement's.
>
> So when you run your parser against the input you supplied in the
> previous
> message. The parser sees the keyword - er I mean the NAME - PROGRAM
> as the
> first token it encounters. PROGRAM is not a valid starting token
> for the
> functionStatement rule. So the parser just silently quits, without
> parsing
> anything because it found ZERO functionStatement's and you have said
> that
> is an okay thing.
>
>
> Suggestions:
>
> 1) I would suggest that you explicitly require an EOF token at the
> end of
> any valid input - this will immediately show problems like the one
> discussed above. So I would suggest that you change your root
> rule to:
>
> root : ( functionStatement )* EOF ;
>
> running your parser with this version of the root rule should
> produce a
> syntax error - something similar to "found PROGRAM, expecting
> FUNCTION"
>
> 2) I would suggest not trying to deal with case insensitivity in your
> lexer. Rather I would suggest using the case insensitive input file
> stream posted to the antlr-interest mailing list back in december of
> 2006. ask about it again if you can't find it in the list's archives.
>
> 3) I would not try to recognize keywords using a Parser rule - such
> as your
> type rule. Your type rule expects to see each individual letter of the
> various keywords. However, ANTLR lexers are very greedy, they will
> consume the longest possible sequence of characters that matches some
> lexer rule. So your type rule will never see any individual letter
> because all of the letters will be greedily gobbled up by the NAME
> rule. Make the type rule be a lexer rule, and see the next
> suggestion...
>
> 4) You are going to experience a devil of a time trying to deal with
> keywords that also may be identifiers. I believe there are lots of
> messages about this in the mailing list archives.
>
> Hope this helps.
> -jbb
>
>
More information about the antlr-interest
mailing list