[antlr-interest] literals versus arbitrary text

Wed Jul 19 02:07:27 PDT 2006

Hi,

I'm encountering difficulties trying to implement a grammar to process Java 
sources, but only to get the main class or interface declaration.
Previously, I was using a regexp such as:

(.*?)?(public|protected|private)\s+(class|interface)(\s+extends.*?)?
(\s+implements.*?)?(\{.*)

Such approach had a performance drawback, and I chose to define a grammar 
instead. However, I don't know how to handle the difference between an 
occurrence of the 'class' word inside the copyright header, in the class 
declaration, or inside the class' code.
I started defining a literal for each of the words I care about, but the 
difference is not the word, but its context.
Which is the approach in cases like this? I mean, a grammar which starts 
thinking everything is arbitrary text until a context is found, then parse 
such context, and digest the rest.

Thank you very much in advance, and thanks Ter and the rest for both ANTLR and 
ST.

Jose.