[antlr-interest] Lexer not putting colon back
Paul J. Lucas
dude at darkfigure.org
Thu Nov 14 21:13:49 PST 2002
Assume I want to parse a statement of the form:
let $x := $y
or:
LET DOLLAR QNAME ASSIGN DOLLAR QNAME
where the lexer is defined as:
tokens { LET; QNAME; }
protected Digit : '0'..'9' ;
protected Letter : 'A'..'Z' | 'a'..'z' | '_' ;
protected NCName : Letter (NCNameChar)* ;
protected NCNameChar : Letter | Digit | '.' | '-' ;
protected QName : NCName (':' NCName)? ;
protected WhiteSpace : ' ' | '\t' | '\r' | '\n' ;
ASSIGN : ":=" ;
DOLLAR : '$' ;
EQUAL : '=' ;
S : (WhiteSpace)+ { $setType( Token.SKIP ); } ;
Keywords
: "let" { $setType( LET ); }
| QName { $setType( QNAME ); }
;
This works fine as given above. But if I remove the whitespace
after the $x like:
let $x:= $y
Then it gets it wrong. An excerpt of the trace output is:
> lexer mKeywords; c==x
> lexer mQName; c==x
> lexer mNCName; c==x
> lexer mLetter; c==x
< lexer mLetter; c==:
< lexer mNCName; c==:
> lexer mNCName; c===
> lexer mLetter; c===
< lexer mLetter; c===
< lexer mNCName; c===
< lexer mQName; c===
< lexer mKeywords; c===
< varRef; > lexer mEQUAL; c===
< lexer mEQUAL; c==1
LA(1)===
< startRule; LA(1)===
exception: line 1:8: unexpected char: '='
When it encounters the ':', it tries to make it part of a
QName, e.g, "x:z"; but since the next character is an '=', it
can't do that. What it SHOULD do is put the ':' back, return
'x' as the QNAME, then pick up with ':' as part of ":=". But
it doesn't. Why not? And how can I fix this so that it
correctly returns the right tokens regardless of whether
whitespace is there?
- Paul
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list