[antlr-interest] distinction between newline and ws

Peizhao Hu peizhao at itee.uq.edu.au
Sat Oct 20 18:48:19 PDT 2007


not sure what you guys trying to do, but try the following:

grammar T;
options {
	k=3;
}

test 	: (TEXT | NEWLINE | WS)* ;
TEXT	: 'x'+ ;
NEWLINE : '\r'? '\n' ;
WS	: (' '|'\t')* {$channel=HIDDEN;} ;


regards;

Peizhao


Joseph Gentle wrote:
> [forgot to reply all]
> 
> I can't find the documentation for it, but ANTLR does seem to have token 
> matching precedence rules.
> 
> Have a play with it - write a tokeniser like this:
> 
> test : ( TEXT | NEWLINE | WS )*;
> TEXT : 'x'+;
> 
> NEWLINE     :     '\r'? '\n';
> 
> WS    :     (' '|'\t'|'\n'|'\r')+;
> 
> 
> and pass it some strings with newlines and whitespace and whatnot. Have 
> a look at the token stream generated. I've got a feeling that antlr 
> prefers to match earlier tokens to later tokens. Using your rules, I 
> expect that a line of text followed immediately by a newline will become 
> TEXT NEWLINE whereas a line of text followed by whitespace then a 
> newline will be TEXT WS. This is because by default the + in the WS rule 
> is greedy and will consume the newline as well, if it can.
> 
> Have a play!
> 
> -J
> 
> 
> Sven Busse wrote:
>>
>> hello,
>>
>>  
>>
>> i am very new to antlr and language recognition. So i bought the book
>>
>> from Terence Parr and now i am currently working through the first
>>
>> example, the calculator. And unfortunately already, i don’t understand
>>
>> something. The grammar looks like this:
>>
>>  
>>
>> grammar Expr;
>>
>>  
>>
>> prog  :     stat+ ;
>>
>>  
>>
>> stat  :     expr NEWLINE
>>
>>       |     ID '=' expr NEWLINE
>>
>>       |     NEWLINE
>>
>>       ;
>>
>>  
>>
>> expr  :     multExpr (('+'|'-') multExpr)* ;
>>
>>  
>>
>> multExpr:   atom ('*' atom)* ;
>>
>>  
>>
>> atom  :     INT
>>
>>       |     ID
>>
>>       |     '(' expr ')'
>>
>>       ;
>>
>>  
>>
>> ID    :     ('a'..'z'|'A'..'Z')+;
>>
>> INT   :     '0'..'9'+;
>>
>> NEWLINE     :     '\r'? '\n';
>>
>> WS    :     (' '|'\t'|'\n'|'\r')+ {skip();};
>>
>>  
>>
>> My Question now is, how does antrl know, that “\n” should match to a 
>> NEWLINE instead
>>
>> of WS (which would mean, it would skip it)? I would have thought, this 
>> grammar is
>>
>> ambiguous, but apparantly, it isn’t. Why not?
>>
>>  
>>
>> Thank you
>>
>> Sven
>>
> 
> 


More information about the antlr-interest mailing list