[antlr-interest] skipping whitespaces in code and avoiding it in comments

Mon Mar 9 02:37:10 PDT 2009

Hello,

Thanks for your response.

Sam Barnett-Cormack wrote:

[cut]

> It's far more common to make VALUE, ID, and COMMENT token types (and 
> comment different to what you have now - from // to newline inclusive is 
> more normal). Then you put the comments and the WS on the hidden 
> channel. 

If I put comments on the hidden channel, then how can I make a parser to 
cache it ?

My goal is to associate single-line comments with "corresponding" 
identifiers of schema elements in SQL.

The specification of the language does not define which comment relates 
to what schema element (table or column). Moreover, SQL'92 standard 
defines comments as yet another separator (similarly to whitespaces), 
that as you said is -- by default sent -- to the hidden channel by a lexer.

Therefore I don't want within my grammar to define explicitly where 
comments about the given identifiers should be (that would be narrowing 
SQL standard) but rather cache (somehow) the comments and identifiers of 
schema elements within rule actions and then apply also some kind of 
heuristic, for instance:

1. if a comment is between <table_definition>s then associate it to the
    following <table_definition>, the not previous one.

2. if a comment is inside of <table_definition> then:

    (a) if a comment is in any line of a <column_definition> then
        associate it with the <column_name> value of this
        <colum_definition> (<column_definition>s can be spanned over more
        then one line)

    (b) otherwise, i.e. if a comment is in a separate line between
        two <column-definition>s then associate it with the <column_name>
        value of the following <column_definition>, not the previous one.

That would require caching line numbers of comments found by lexer and 
passing them to the parser, isn't?

Or there is another way to do it?

Maciej