[antlr-interest] Comments parser and non-alphanum characters
Cor Geboers
cg0601 at hotmail.com
Mon Apr 19 01:45:04 PDT 2010
Hi, I have a problem with a parser which needs to interpret a comment in a command language. The CL uses commands inside an HTML command pair: '<!--' command '-->' and I can parse most commands, except for the REM command which is a comment remark and should be ignored.
I wrote a small test grammar, which shows the problem more or less:
grammar Remarks;
options {
language = Java;
}
rule: commandLine+ ;
commandLine
: '<!--' command '-->'
;
command
: breakCommand
| remarkCommand
;
remarkCommand
: REM (.)*
;
breakCommand
: BREAK
;
WS
: (' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; }
;
REM
: '#' ('R'|'r') ('E'|'e') ('M'|'m')
;
BREAK
: '#' ('B'|'b')('R'|'r')('E'|'e')('A'|'a')('K'|'k');
IDENT : ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9')*;
A sample command file might look like this:
<!-- #rem some comment -->
<!-- #break -->
<!-- #rem some comment with $AAA &*&^, A9a 5eee and 99922 and .<><> -->
The parser recognizes the rem commands and the break command, but some characters are lost. It also divides the "comment" text into other tokens (IDENT in this case). Ideally I would like to get all characters back as one part, but I tried several constructs without any result.
The last line is even parsed worse: all "special" characters like $, &, etc are generating warnings and not found back into the tokens. The errors/warnings generated are like this:
line 3:28 no viable alternative at character '$'
line 3:33 no viable alternative at character '&'
line 3:34 no viable alternative at character '*'
line 3:35 no viable alternative at character '&'
line 3:36 no viable alternative at character '^'
line 3:37 no viable alternative at character ','
line 3:43 no viable alternative at character '5'
line 3:52 no viable alternative at character '9'
line 3:53 no viable alternative at character '9'
How can I create the comment, so that all characters are either ignored or returned as one rule or token ? It should do so only when inside a comment. I looked at other grammars for comments, like C with /* */ and see they do about the same.
_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
https://signup.live.com/signup.aspx?id=60969
More information about the antlr-interest
mailing list