[antlr-interest] distinction between newline and ws
Joseph Gentle
josephg at cse.unsw.edu.au
Sat Oct 20 17:58:27 PDT 2007
[forgot to reply all]
I can't find the documentation for it, but ANTLR does seem to have token
matching precedence rules.
Have a play with it - write a tokeniser like this:
test : ( TEXT | NEWLINE | WS )*;
TEXT : 'x'+;
NEWLINE : '\r'? '\n';
WS : (' '|'\t'|'\n'|'\r')+;
and pass it some strings with newlines and whitespace and whatnot. Have
a look at the token stream generated. I've got a feeling that antlr
prefers to match earlier tokens to later tokens. Using your rules, I
expect that a line of text followed immediately by a newline will become
TEXT NEWLINE whereas a line of text followed by whitespace then a
newline will be TEXT WS. This is because by default the + in the WS rule
is greedy and will consume the newline as well, if it can.
Have a play!
-J
Sven Busse wrote:
>
> hello,
>
>
>
> i am very new to antlr and language recognition. So i bought the book
>
> from Terence Parr and now i am currently working through the first
>
> example, the calculator. And unfortunately already, i don't understand
>
> something. The grammar looks like this:
>
>
>
> grammar Expr;
>
>
>
> prog : stat+ ;
>
>
>
> stat : expr NEWLINE
>
> | ID '=' expr NEWLINE
>
> | NEWLINE
>
> ;
>
>
>
> expr : multExpr (('+'|'-') multExpr)* ;
>
>
>
> multExpr: atom ('*' atom)* ;
>
>
>
> atom : INT
>
> | ID
>
> | '(' expr ')'
>
> ;
>
>
>
> ID : ('a'..'z'|'A'..'Z')+;
>
> INT : '0'..'9'+;
>
> NEWLINE : '\r'? '\n';
>
> WS : (' '|'\t'|'\n'|'\r')+ {skip();};
>
>
>
> My Question now is, how does antrl know, that "\n" should match to a
> NEWLINE instead
>
> of WS (which would mean, it would skip it)? I would have thought, this
> grammar is
>
> ambiguous, but apparantly, it isn't. Why not?
>
>
>
> Thank you
>
> Sven
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071021/686b289d/attachment-0001.html
More information about the antlr-interest
mailing list