[antlr-interest] Multiline comment vs MySQL version comment

Mike Lischke mike at lischke-online.de
Tue Sep 4 07:21:04 PDT 2012


Hi group,

doing final touches on my new MySQL grammar I thought it would be time to attack this dreaded problem. Parsing a C like multi line comment is a standard task, however MySQL supports this special form of such a comment:

/*!12345 text */

The number is checked against the current server on which this query runs and if it is >= the current number the comment delimiters are simply removed and the lexer returns "text" as if there were never any comment delimiters. This goes so far that you even can have one level of comment nesting like

/*!12345 text /* text */ text */

which would be valid given the version number fits (which is always 5 digits long, btw).

The (Bison based) MySQL server parser solves this problem in the handwritten lexer (<sigh>) and simply jumps over the version introducer and keeps a flag to know when to ignore the final '*/'. I would like to solve this however in the grammar if this is at all possible. I'm aware that this is not context-free. However I could probably use an action/predicate to do such a check of the number.

Has anyone solved this problem already somehow? How would I write the lexer rule(s) to get the embedded text to parse normally or ignore it entirely like with any multiline comment, depending on the version number.

Any hints are really appreciated,

Mike
-- 
www.soft-gems.net



Mike
-- 
www.soft-gems.net



More information about the antlr-interest mailing list