[antlr-interest] Struggling to analyze text inside comments

Stefan Misch stefan.misch at gmx.de
Sun Jan 2 17:37:59 PST 2011


Hi,

I need to analyze the text inside comments at the start of a program. I
started with the default lexer rules for comments and tried to modify them.
My program uses C-like comments except that a single line comment starts
with '--' instead of '//'.

I have two questions:

Q1: how can I grab the text of a multiline comment, .i.e. the part expressed
by (options {greedy=false;} : . )*
so that it can also be analyzed by "INFO"

Q2: what must I do, so that in INFO not only the last alternative gets
chosen. Using "=>" I removed the syntactic ambiguity between the different
alternatives and the last one, because "TEXT" may match everthing (except
newline chars). But using AntlrWorks to debug the sample shown below, I only
see that "TEXT: ..." gets printed in the output tab.

This is my grammar:

<BOF X.g>
grammar X;
 
program
 : header
 ;
 
header
 : COMMENT*
 ;
 
COMMENT
    : '--' INFO '\r'? '\n'
    | '/*' (options {greedy=false;} : . )* '*/' // Q1: how to get text of
comment
    ;
 
INFO
 : ('TITLE:')   => 'TITLE:'   TEXT {System.out.println("TITLE: " +
$TEXT.text);}
 | ('NAME:')    => 'NAME:'    TEXT
 | ('VERSION:') => 'VERSION:' 'V'? VERSION_NR DATE
 | ('DESC:')    => 'DESC:'    TEXT
 | ('V')        => 'V' VERSION_NR DATUM NAME TEXT
 | TEXT {System.out.println("TEXT: " + $TEXT.text);} // Q2: this alternative
is the only one chosen
 ;
   
fragment VERSION_NR : DIGIT DIGIT DIGIT;
fragment DATE  : DIGIT DIGIT '.' DIGIT DIGIT '.'  DIGIT DIGIT DIGIT DIGIT;
fragment DIGIT  : '0'..'9';
fragment UMLAUT  : 'ä'|'ö'|'ü'|'ß'|'Ä'|'Ö'|'Ü';
fragment NAME  : ('a'..'z'|'A'..'Z'|UMLAUT)+;
fragment SPACE  : (' '|'\t')+;
fragment TEXT  : ~('\n'|'\r')*;
 
WS     : (' '|'\t'|'\r'|'\n')+ {$channel=HIDDEN;};
<EOF X.g>

This is an example of a header I need to analyze:

<BOF sample.txt>
--
==========================================================================
-- TITLE:      Test
-- 
-- NAME:       test.prg
-- 
-- VERSION:    V003 29.01.2010
-- 
-- DESC:       some text
--             - some more text
--             - even more text
-- 
-- HISTORY:
--
---------------------------------------------------------------------------
-- V001	30.07.2009	Name	some comment
--
---------------------------------------------------------------------------
-- V002	01.10.2009	Name	some more comment -
--
---------------------------------------------------------------------------
-- V003	29.01.2010	Name	even more comment
--
===========================================================================
<EOF sample.txt>

The header may also be written using /* .. */ comment style (just one at the
begin and end or even for each line). But as I failed to solve question 1 I
tried to start with just the single line comments.

I really appreciate any help.
Stefan
 




More information about the antlr-interest mailing list