[antlr-interest] case-insensitive parsing

Sam Harwell sharwell at pixelminegames.com
Thu Apr 23 07:21:20 PDT 2009


Hi Bob,

 

You should make a filter lexer that parses the doc comments by
themselves. In your main lexer, read the entire token as a single
comment, then you can later pass the text of those comments to the
doxygen tag filter lexer to extract the information from those comments.
It works great and doesn't interfere with your language's grammar:

http://wiki.pixelminegames.com/images/b/b0/Uc_doccommentassist.png

 

Sam

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Bob Sole
Sent: Thursday, April 23, 2009 7:42 AM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] case-insensitive parsing

 

Hi Andreas,

Unfortunately that won't help me in this case, because I still need to
be able to parse the contents of the comments. The overall aim is to
extract Javadoc-style tags such as "@param" from the comment section and
match them to procedure parameters, e.g.

/** this procedure does nothing much
  @param aString this is a string
  @param anInt this is a number
*/
PROCEDURE thingy(aString IN VARCHAR2, anInt IN OUT NUMBER);

I need to be able to handle the keywords in a case-insensitive manner,
because the codebase I'm working with evolved over many years and is
frankly a real mess. For example, some developers declare procedures
like this:

Procedure Thingy(AString IN Varchar2, AnInt In Out Number);

whereas others do it C-style :-)

procedure thingy(a_string in varchar2, an_int in out number);

Horrible, I know. But I need to be able to parse all combinations
thereof :-)
I've looked at PLDoc, but it doesn't really address this issue.

Cheers
Bob.

On Thu, Apr 23, 2009 at 1:20 PM, Andreas Meyer
<andreas.meyer at smartshift.de> wrote:

Have you tried making the whole comment a lexer token? This way, the
keyword tokens would not interfere with plain text inside comments (If
that was your intention: lexer rule names have to start with an upper
case letter)

Andreas


Bob Sole schrieb:

> I'm trying to write a parser for PL/SQL package header files but I'm
> banging my head against the wall with a basic problem to do with
> case-insensitive parsing. I'm using Jim Idle's NoCaseFileStream to
> convert tokens into upper case, but I'm finding that the parser gets
> confused when it comes across language keywords that are embedded
> within comments. Here's some example input which has the OR keyword
> embedded within the package comment. The "create or replace package"
> statement is deliberately messed up - the parser handles this
> correctly, but it stumbles against the first 'or' on line 2:
>
> /**
> blah blah or blah
> */
> create Or rePlace PACKAGE
> test IS
>
> Here's the grammar:
>
> grammar Test;
>
> input: statement+ ;
>
> statement: pkgComment | pkgStmt ;
>
> pkgComment: '/**' description '*/' ;



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090423/ff369058/attachment.html 


More information about the antlr-interest mailing list