[antlr-interest] distinction between newline and ws
Sven Busse
mail at ghost23.de
Sun Oct 21 03:10:52 PDT 2007
Hi,
thanks for the answers so far. I found something out, too. If i just swap
the lines for NEWLINE and WS, the grammar behaves differently, the NEWLINE
then is not matched. So i guess the order, in which those rules are set in
the grammar, makes a difference. Nevertheless, actually i would have
thought, the correct way to set the rules for NEWLINE and WS would be:
NEWLINE : '\r'? '\n';
WS : (' '|'\t')+ {skip();};
Means, i do not put \n and \r in WS.
Perhaps i should simply read on in the book ;o)
________________________________________
Von: Joseph Gentle [mailto:josephg at cse.unsw.edu.au]
Gesendet: Sonntag, 21. Oktober 2007 02:58
An: antlr-interest at antlr.org
Betreff: Re: [antlr-interest] distinction between newline and ws
[forgot to reply all]
I can't find the documentation for it, but ANTLR does seem to have token
matching precedence rules.
Have a play with it - write a tokeniser like this:
test : ( TEXT | NEWLINE | WS )*;
TEXT : 'x'+;
NEWLINE : '\r'? '\n';
WS : (' '|'\t'|'\n'|'\r')+;
and pass it some strings with newlines and whitespace and whatnot. Have a
look at the token stream generated. I've got a feeling that antlr prefers to
match earlier tokens to later tokens. Using your rules, I expect that a line
of text followed immediately by a newline will become TEXT NEWLINE whereas a
line of text followed by whitespace then a newline will be TEXT WS. This is
because by default the + in the WS rule is greedy and will consume the
newline as well, if it can.
Have a play!
-J
Sven Busse wrote:
hello,
i am very new to antlr and language recognition. So i bought the book
from Terence Parr and now i am currently working through the first
example, the calculator. And unfortunately already, i dont understand
something. The grammar looks like this:
grammar Expr;
prog : stat+ ;
stat : expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
expr : multExpr (('+'|'-') multExpr)* ;
multExpr: atom ('*' atom)* ;
atom : INT
| ID
| '(' expr ')'
;
ID : ('a'..'z'|'A'..'Z')+;
INT : '0'..'9'+;
NEWLINE : '\r'? '\n';
WS : (' '|'\t'|'\n'|'\r')+ {skip();};
My Question now is, how does antrl know, that \n should match to a NEWLINE
instead
of WS (which would mean, it would skip it)? I would have thought, this
grammar is
ambiguous, but apparantly, it isnt. Why not?
Thank you
Sven
More information about the antlr-interest
mailing list