[antlr-interest] Follow up to hoisted predicates and local variables

Juancarlo Añez apalala at gmail.com
Mon Sep 17 06:20:40 PDT 2012


Mike,

I'm not sure about the treatment of VERSION_COMMENT_END. I would have
excluded it from the first rule, I would have not made it a fragmet, and I
would not have provided the empty option.

But if it works...

-- Juanca

On Mon, Sep 17, 2012 at 7:38 AM, Mike Lischke <mike at lischke-online.de>wrote:

> Hey Jim, Jesse, Juancarlo,
>
> thank you all for your valuable input.
>
> > Create a rule that lexes /*
> > Create an input->mark at the start of this rule
> > Using hand crafted code, walk through the input stream
> > If a normal comment, then you are just finding the matching */ (handle
> > embedded)
> > If a !12345 comment, then
> >   directly change the /*!12345 to spaces in the input stream,
> >   find the matching */ and change those to spaces
> >   input->rewind to the mark you created
> >   exit the rule
>
>
> Not a bad idea, as it is attacking the problem at a low level. However,
> I'd like to avoid including target specific code as much as possible (or if
> included, like in predicates, then in a way that's easy to port).
>
> Additionally, I didn't mention some further facts about those version
> comments. There's a third form /*! ... */ which is like the one with a
> version number, but always matches (so the comment decoration is always
> removed and the content handled as normal text. Additionally, there can be
> one level of block comment nesting, but then version comments are treated
> like normal block comments. After letting all this and your input sink in I
> was able to come up with a solution this morning. For reference if anyone
> is later searching for a similar solution:
>
> COMMENT_RULE:
>         // Comment introducer intentionally written as two chars, to avoid
> trouble in generated lexer
>         // when the source line is quoted in a block comment there. Same
> applies for the other cases below.
>         '/' '*' BLOCK_COMMENT
>         | VERSION_COMMENT_END
>         | POUND_COMMENT
>         | {LA(3) == ' ' || LA(3) == '\t' || LA(3) == '\n' || LA(3) ==
> '\r'}? => DASHDASH_COMMENT
> ;
>
> // There are 3 types of block comments:
> // /* ... */ - The standard multi line comment.
> // /*! ... */ - A comment used to mask code for other clients. In MySQL
> the content is handled as normal code.
> // /*!12345 ... */ - Same as the previous one except code is only used
> when the given number is a lower value
> //                   than the current server version (specifying so the
> minimum server version the code can run with).
> fragment BLOCK_COMMENT options{ greedy = false; }:
>         {!in_version_comment}? VERSION_COMMENT
>         | MULTILINE_COMMENT
> ;
>
> fragment VERSION_COMMENT
> @init { matched_version = true; }
> :
>         LOGICAL_NOT_OPERATOR
>                 (
>                         v = INTEGER { matched_version = check_version($v);
> } VERSION_COMMENT_TAIL
>                         | VERSION_COMMENT_TAIL
>                 )
> ;
>
> fragment VERSION_COMMENT_TAIL:
>         { !matched_version }? =>
>                 ( options { greedy = false; }:
>                         ('/*' MULTILINE_COMMENT)  // One level of block
> comment nesting is allowed for versioned comments.
>                         | .
>                 )* '*''/' { $type = MULTILINE_COMMENT; $channel = 98; }
>         | { $type = VERSION_COMMENT; $channel = 98; in_version_comment =
> true; }
> ;
>
> fragment MULTILINE_COMMENT:     ( options { greedy = false; }: . )* '*''/'
> { $channel = 98; };
>
> fragment VERSION_COMMENT_END:
>         {in_version_comment}? => '*''/' { $channel = 98;
> in_version_comment = false; }
>         | // Intentionally left empty to make the gated semantic predicate
> work.
> ;
>
> fragment POUND_COMMENT:                 '#' ~('\n'|'\r')* '\r'? '\n' {
> $channel = 98; };
> fragment DASHDASH_COMMENT:              '--' (' ' | '\t' | '\n' | '\r')
> ~('\n'|'\r')* ('\r' | '\n' | EOF) { $channel = 98; };
>
>
> in_version_comment and matched_version are both lexer member vars. This is
> part of my upcomming complete MySQL grammar.
>
> Mike
> --
> www.soft-gems.net
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



-- 
Juancarlo *Añez*


More information about the antlr-interest mailing list