[antlr-interest] Struggling to analyze text inside comments
Stefan Misch
stefan.misch at gmx.de
Sun Jan 2 17:37:59 PST 2011
Hi,
I need to analyze the text inside comments at the start of a program. I
started with the default lexer rules for comments and tried to modify them.
My program uses C-like comments except that a single line comment starts
with '--' instead of '//'.
I have two questions:
Q1: how can I grab the text of a multiline comment, .i.e. the part expressed
by (options {greedy=false;} : . )*
so that it can also be analyzed by "INFO"
Q2: what must I do, so that in INFO not only the last alternative gets
chosen. Using "=>" I removed the syntactic ambiguity between the different
alternatives and the last one, because "TEXT" may match everthing (except
newline chars). But using AntlrWorks to debug the sample shown below, I only
see that "TEXT: ..." gets printed in the output tab.
This is my grammar:
<BOF X.g>
grammar X;
program
: header
;
header
: COMMENT*
;
COMMENT
: '--' INFO '\r'? '\n'
| '/*' (options {greedy=false;} : . )* '*/' // Q1: how to get text of
comment
;
INFO
: ('TITLE:') => 'TITLE:' TEXT {System.out.println("TITLE: " +
$TEXT.text);}
| ('NAME:') => 'NAME:' TEXT
| ('VERSION:') => 'VERSION:' 'V'? VERSION_NR DATE
| ('DESC:') => 'DESC:' TEXT
| ('V') => 'V' VERSION_NR DATUM NAME TEXT
| TEXT {System.out.println("TEXT: " + $TEXT.text);} // Q2: this alternative
is the only one chosen
;
fragment VERSION_NR : DIGIT DIGIT DIGIT;
fragment DATE : DIGIT DIGIT '.' DIGIT DIGIT '.' DIGIT DIGIT DIGIT DIGIT;
fragment DIGIT : '0'..'9';
fragment UMLAUT : 'ä'|'ö'|'ü'|'ß'|'Ä'|'Ö'|'Ü';
fragment NAME : ('a'..'z'|'A'..'Z'|UMLAUT)+;
fragment SPACE : (' '|'\t')+;
fragment TEXT : ~('\n'|'\r')*;
WS : (' '|'\t'|'\r'|'\n')+ {$channel=HIDDEN;};
<EOF X.g>
This is an example of a header I need to analyze:
<BOF sample.txt>
--
==========================================================================
-- TITLE: Test
--
-- NAME: test.prg
--
-- VERSION: V003 29.01.2010
--
-- DESC: some text
-- - some more text
-- - even more text
--
-- HISTORY:
--
---------------------------------------------------------------------------
-- V001 30.07.2009 Name some comment
--
---------------------------------------------------------------------------
-- V002 01.10.2009 Name some more comment -
--
---------------------------------------------------------------------------
-- V003 29.01.2010 Name even more comment
--
===========================================================================
<EOF sample.txt>
The header may also be written using /* .. */ comment style (just one at the
begin and end or even for each line). But as I failed to solve question 1 I
tried to start with just the single line comments.
I really appreciate any help.
Stefan
More information about the antlr-interest
mailing list