[antlr-interest] How to swich the recognition scope in Lexer
Silvester Pozarnik
silvester.pozarnik at tracetracker.com
Wed Jun 20 06:12:14 PDT 2007
>> Silvester Pozarnik wrote this on [20 June 2007 13:00]:
>>
>> In the antlr 2.7.7 you could change the behaviour of Lexer so
>> that tokens are recognized az literals in special cases by
>> overriding the
>> testLiteralsTable() method in CharScanner class. How to the
>> same in antlr 3.0 if you have a grammar as:
>>
>> grammar test;
>> tokens {
>> MYTOKEN = 'mytoken';
>> }
>> mygrammar:
>> {
>> MYTOKEN LPAREN IDENTIFIER RPAREN
>> }
>>
>> LPAREN : '(' ;
>> RPAREN : ')' ;
>> IDENTIFIER
>> : ('a'..'z' | 'A'..'Z' | '\u0080'..'\ufffe') (
>> Letter | Digit)*;
>>
>> fragment Letter
>> : 'a'..'z' | 'A'..'Z' | '_' |'-' | '\u0080'..'\ufffe';
>>
>> fragment Digit
>> : '0'..'9';
>>
>> So that the input "mytoken(mytoken)" is a valid. The first
>> 'mytoken' should be recognized as MYTOKEN, but the second
>> 'mytoken' has to be recognized as an IDENTIFIER. Is there a
>> way to achieve this?
>
>Not to my knowledge (and this applies to V2.x too). Is suspect you need
to
>change your 'mygrammar' rule:
>
> mygrammar : MYTOKEN LPAREN (MYTOKEN|IDENTIFIER) RPAREN
>
>Micheal
Hei Micheal,
The way you proposed to change the rule would not work as it is still
undeterministic when processed by Lexer ("should I recognize an
IDENTIFIER or MYTOKEN!?). I'm not sure what takes precedence here. The
proposed parser rule also alter the nature of language. This was anyway
just an example - the more general problem is that in some languages you
may need that the key words are under some condition (scope) recognized
as literals (e.g "...City=Kansas City, ... Idol=Joe Idol etc.).
In the 2.7.7 you could fix this by adding to your lexer definition:
class Testlexer extends Lexer;
{
private static List<String> ident_stack = new LinkedList<String>();
// Test the token text against the literals table
// Override this method to perform a different literals test
public int testLiteralsTable(int ttype) {
if (ident_stack.size() >= 1 &&
"mygrammar".compareToIgnoreCase(
ident_stack.get(ident_stack.size()-1) ) == 0) {
ident_stack.add(text.toString());
return ttype;
}
ident_stack.add(text.toString());
// this is the original stuff
hashString.setBuffer(text.getBuffer(), text.length());
Integer literalsIndex = (Integer)literals.get(hashString);
if (literalsIndex != null) {
ttype = literalsIndex.intValue();
}
return ttype;
}
}
I could of course redefine a rule as:
mygrammar : MYTOKEN LPAREN STRINGVALUE RPAREN;
...
STRINGVALUE
: '\'' ( ~('\''|'\\') )* '\''
;
But then I have to change the already established syntax of my language.
Any help?
BR.
Silvester Pozarnik
More information about the antlr-interest
mailing list