[antlr-interest] Lexer not putting colon back

Sriram Durbha cintyram at yahoo.com
Fri Nov 15 06:49:42 PST 2002


hi lucas,
i have a similar problem,
 the reason i guess is that antlr is an ll parser,
that is it looks for tokens from left to right and also it takes a
decision to match at the earliest possible opportunity;


with antlr we should be able to use syntactic predicate to make sure
that if the : is followed by QName only it has to match the whole as a
Qname, other wise it should consider the second alt;


hope this helps 
cheers
ram

--- "Paul J. Lucas" <dude at darkfigure.org> wrote:
> 	Assume I want to parse a statement of the form:
> 
> 		let $x := $y
> 
> 	or:
> 
> 		LET DOLLAR QNAME ASSIGN DOLLAR QNAME
> 
> 	where the lexer is defined as:
> 
> 		tokens { LET; QNAME; }
> 
> 		protected Digit		: '0'..'9' ;
> 		protected Letter	: 'A'..'Z' | 'a'..'z' | '_' ;
> 		protected NCName	: Letter (NCNameChar)* ;
> 		protected NCNameChar	: Letter | Digit | '.' | '-' ;
> 		protected QName		: NCName (':' NCName)?  ;
> 		protected WhiteSpace	: ' ' | '\t' | '\r' | '\n' ;
> 
> 		ASSIGN	: ":=" ;
> 		DOLLAR	: '$' ;
> 		EQUAL	: '=' ;
> 		S	: (WhiteSpace)+ { $setType( Token.SKIP ); } ;
> 
> 		Keywords
> 			: "let"     { $setType( LET ); }
> 			| QName     { $setType( QNAME ); }
> 			;
> 
> 	This works fine as given above.  But if I remove the whitespace
> 	after the $x like:
> 
> 		let $x:= $y
> 
> 	Then it gets it wrong.  An excerpt of the trace output is:
> 
> 		 > lexer mKeywords; c==x
> 		  > lexer mQName; c==x
> 		   > lexer mNCName; c==x
> 		    > lexer mLetter; c==x
> 		    < lexer mLetter; c==:
> 		   < lexer mNCName; c==:
> 		   > lexer mNCName; c===
> 		    > lexer mLetter; c===
> 		    < lexer mLetter; c===
> 		   < lexer mNCName; c===
> 		  < lexer mQName; c===
> 		 < lexer mKeywords; c===
> 		  < varRef;  > lexer mEQUAL; c===
> 		 < lexer mEQUAL; c==1
> 		LA(1)===
> 		 < startRule; LA(1)===
> 		exception: line 1:8: unexpected char: '='
> 
> 	When it encounters the ':', it tries to make it part of a
> 	QName, e.g, "x:z"; but since the next character is an '=', it
> 	can't do that.  What it SHOULD do is put the ':' back, return
> 	'x' as the QNAME, then pick up with ':' as part of ":=".  But
> 	it doesn't.  Why not?  And how can I fix this so that it
> 	correctly returns the right tokens regardless of whether
> 	whitespace is there?
> 
> 	- Paul
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/ 
> 
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list