[antlr-interest] skipping whitespaces in code and avoiding it in comments

Mon Mar 9 04:07:52 PDT 2009

Maciej Gawinecki wrote:
> Hello,
> 
> Thanks for your response.
> 
> Sam Barnett-Cormack wrote:
> 
> [cut]
> 
>> It's far more common to make VALUE, ID, and COMMENT token types (and 
>> comment different to what you have now - from // to newline inclusive 
>> is more normal). Then you put the comments and the WS on the hidden 
>> channel. 
> 
> If I put comments on the hidden channel, then how can I make a parser to 
> cache it ?
> 
> My goal is to associate single-line comments with "corresponding" 
> identifiers of schema elements in SQL.
> 
> The specification of the language does not define which comment relates 
> to what schema element (table or column). Moreover, SQL'92 standard 
> defines comments as yet another separator (similarly to whitespaces), 
> that as you said is -- by default sent -- to the hidden channel by a lexer.
> 
> Therefore I don't want within my grammar to define explicitly where 
> comments about the given identifiers should be (that would be narrowing 
> SQL standard) but rather cache (somehow) the comments and identifiers of 
> schema elements within rule actions and then apply also some kind of 
> heuristic, for instance:
> 
> 1. if a comment is between <table_definition>s then associate it to the
>    following <table_definition>, the not previous one.
> 
> 2. if a comment is inside of <table_definition> then:
> 
>    (a) if a comment is in any line of a <column_definition> then
>        associate it with the <column_name> value of this
>        <colum_definition> (<column_definition>s can be spanned over more
>        then one line)
> 
>    (b) otherwise, i.e. if a comment is in a separate line between
>        two <column-definition>s then associate it with the <column_name>
>        value of the following <column_definition>, not the previous one.
> 
> That would require caching line numbers of comments found by lexer and 
> passing them to the parser, isn't?
> 
> Or there is another way to do it?

The whole point of the hidden channel (rather than just discarding) is 
that the parser *can* tune to it, or use it for context checking in 
predicates, or use it in actions. I've never done it, so I can't tell 
you how, but I know it can be done.

Sam