[antlr-interest] recognizing a function
John B. Brodie
jbb at acm.org
Thu Jul 24 15:28:26 PDT 2008
Greetings!
Guy Kroizman wrote (in part):
>I have written a grammar that I hoped would find a function definition in a
>Fortran file.
>Running it produces nothing. s-:
>
>I played with it a lot and debugged it with jdb and ANTLRWorks but to avail.
>I wonder if anybody would be so kind to point me to the problem with the
>grammar.
>
>grammar fun;
>
> root :
> (functionStatement)*
> ;
It is that pesky * on your start rule.
You have said that a valid program (e.g. any parsable derivation starting
from your root rule) may contain ZERO or more functionStatement's.
So when you run your parser against the input you supplied in the previous
message. The parser sees the keyword - er I mean the NAME - PROGRAM as the
first token it encounters. PROGRAM is not a valid starting token for the
functionStatement rule. So the parser just silently quits, without parsing
anything because it found ZERO functionStatement's and you have said that
is an okay thing.
Suggestions:
1) I would suggest that you explicitly require an EOF token at the end of
any valid input - this will immediately show problems like the one
discussed above. So I would suggest that you change your root rule to:
root : ( functionStatement )* EOF ;
running your parser with this version of the root rule should produce a
syntax error - something similar to "found PROGRAM, expecting FUNCTION"
2) I would suggest not trying to deal with case insensitivity in your
lexer. Rather I would suggest using the case insensitive input file
stream posted to the antlr-interest mailing list back in december of
2006. ask about it again if you can't find it in the list's archives.
3) I would not try to recognize keywords using a Parser rule - such as your
type rule. Your type rule expects to see each individual letter of the
various keywords. However, ANTLR lexers are very greedy, they will
consume the longest possible sequence of characters that matches some
lexer rule. So your type rule will never see any individual letter
because all of the letters will be greedily gobbled up by the NAME
rule. Make the type rule be a lexer rule, and see the next suggestion...
4) You are going to experience a devil of a time trying to deal with
keywords that also may be identifiers. I believe there are lots of
messages about this in the mailing list archives.
Hope this helps.
-jbb
More information about the antlr-interest
mailing list