[antlr-interest] case-insensitive parsing

Thu Apr 23 07:19:12 PDT 2009

Ok ... there are two options:
(1) consume the comment as a whole, but then you feed the matched token 
text to another lexer/parser, which can be written completely independent.
or
(2) use island grammars, as advertised on the Wiki 
(http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control). 
however, this is quite complicated to set up

Island grammars are nice for complicated cases, but maybe in this case 
are just overkill, because the boundary of your comment syntax can be 
identified by the lexer, you do not need the full parser for that. Hope 
that helps :-)

Cheers,
Andreas

Bob Sole schrieb:
> Hi Andreas,
>
> Unfortunately that won't help me in this case, because I still need to 
> be able to parse the contents of the comments. The overall aim is to 
> extract Javadoc-style tags such as "@param" from the comment section 
> and match them to procedure parameters, e.g.
>
> /** this procedure does nothing much
>   @param aString this is a string
>   @param anInt this is a number
> */
> PROCEDURE thingy(aString IN VARCHAR2, anInt IN OUT NUMBER);
>
> I need to be able to handle the keywords in a case-insensitive manner, 
> because the codebase I'm working with evolved over many years and is 
> frankly a real mess. For example, some developers declare procedures 
> like this:
>
> Procedure Thingy(AString IN Varchar2, AnInt In Out Number);
>
> whereas others do it C-style :-)
>
> procedure thingy(a_string in varchar2, an_int in out number);
>
> Horrible, I know. But I need to be able to parse all combinations 
> thereof :-)
> I've looked at PLDoc, but it doesn't really address this issue.
>
> Cheers
> Bob.
>
> On Thu, Apr 23, 2009 at 1:20 PM, Andreas Meyer 
> <andreas.meyer at smartshift.de <mailto:andreas.meyer at smartshift.de>> wrote:
>
>     Have you tried making the whole comment a lexer token? This way, the
>     keyword tokens would not interfere with plain text inside comments (If
>     that was your intention: lexer rule names have to start with an upper
>     case letter)
>
>     Andreas
>
>
>     Bob Sole schrieb:
>     > I'm trying to write a parser for PL/SQL package header files but I'm
>     > banging my head against the wall with a basic problem to do with
>     > case-insensitive parsing. I'm using Jim Idle's NoCaseFileStream to
>     > convert tokens into upper case, but I'm finding that the parser gets
>     > confused when it comes across language keywords that are embedded
>     > within comments. Here's some example input which has the OR keyword
>     > embedded within the package comment. The "create or replace package"
>     > statement is deliberately messed up - the parser handles this
>     > correctly, but it stumbles against the first 'or' on line 2:
>     >
>     > /**
>     > blah blah or blah
>     > */
>     > create Or rePlace PACKAGE
>     > test IS
>     >
>     > Here's the grammar:
>     >
>     > grammar Test;
>     >
>     > input: statement+ ;
>     >
>     > statement: pkgComment | pkgStmt ;
>     >
>     > pkgComment: '/**' description '*/' ;
>
>
>     List: http://www.antlr.org/mailman/listinfo/antlr-interest
>     Unsubscribe:
>     http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> ------------------------------------------------------------------------
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>