[antlr-interest] Lexer problem - previous token semantic predicate

Silvester Pozarnik silvester.pozarnik at tracetracker.com
Fri Apr 4 03:29:02 PDT 2008


Hi,

I have a problem in forcing the lexer to emit the right token type,
based on previous token type encountered.

The string I want to parse is for example "TRD.2ads". This now generate
token sequence (SYS_TRD, REAL_LITERAL) while I need it to generate
(SYS_TRD, DOT, IDENTIFIER) sequence (see the fragments from the grammar
below). The logic should be that if lexer previously encountered a
SYS_TRD token, do not interpret the DOT as a start of REAL_LITERAL, but
as a DOT itself. 

I tried to use semantic predicate "{ input.index()>0 && input.LT(-1) !=
'D' }?" at the commented marker below ( instead of /* code here? */) but
the result is that recognition of REAL_LITERAL gets aborted, but lexer
fails to generate a DOT token.

I will really appreciate if there is someone in antlr community with any
idea how to solve this.

Thanks
Silvester Pozarnik

// ... parser part
tokens {...
  SYS_TRD='TRD';
//...
}

//...
trd_property returns [String value]	:
    SYS_TRD! DOT! {input.LT(1).setType(IDENTIFIER);} property { $value =
$property.text; }
    ;
//..

// lexer parts
DOT      : '.' ;
//...

IDENTIFIER 
    : { testLiterals=true; } ('a'..'z' | 'A'..'Z' | '0'..'9' |
'\u0080'..'\ufffe') ( Letter | Digit)*
    ;
//...

fragment
REAL_LITERAL
    :   ('0'..'9')+ '.' ('0'..'9')* Exponent? 
    |   { /* code here? */ }? '.' ('0'..'9')+ Exponent? 
    |   ('0'..'9')+ Exponent
	;
//...




More information about the antlr-interest mailing list