[antlr-interest] Strength of ANTLR lexer

"Paul Bouché (NSN)" paul.bouche at nsn.com
Mon Mar 30 07:32:28 PDT 2009


Hello,

we repeatedly had the following problem. We had overlapping character 
sets for different TOKEN definitions.
i.e.:
mapping : KEY '=' ATTRIBUTE;
KEY : ('A'..'Z' | 'a'..'z')+;
ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;

The lexer always generates KEY tokens for abc, but what we actually want 
is ATTRIBUTE tokens. The behvavior is of course wanted in case of
token definitions for certain keywords etc. But this is not always 
really good.

Things that can be easily expressed in an EBNF cannot be so easily 
written in ANTLR considering the above example. In the EBNF I could write:
mapping ::== KEY "=" ATTRIBUTE.
KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| .. 
| "Z"| "a" | .. | "z" | "0" | .. | "9")*.

but to express the same thing in ANTLR because of how the ANTLR lexer 
works I have to write:
mapping : KEY '=' (ATTRIBUTE | KEY); // really counter intuitive
KEY : ('A'..'Z' | 'a'..'z')+;
ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;

The problem is that the lexer is toally indepent of the parser and it 
operates totally without context or structure.  Of course everywhere one 
can find this is how to solve this problem, but imo it is really not a 
grammar problem but an ANTLR limitation. Of course another solution is 
to just emit WORD tokens and check in the parser if the value is valid, 
but why lex again what as already been lexed. Other solutions also 
include building the grammar structure backinto the lexer via syn preds 
which is also not what one likes.

Any comments or solutions?
@Ter why was it done this way? Would it not be possible to let the lexer 
be operated by the parser, i.e. something like this:
// ---- grammar start
grammar LexerWithContext;
options {
    noTokenBuffer = true; // new option?
}
mapping : KEY '=' ATTRIBUTE;
KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| .. 
| "Z"| "a" | .. | "z" | "0" | .. | "9")*.
// ---- grammar stop
will for the Java target yield:

public class LexerWithContextParser {
LexerWithContextLexer lexer;
    public final mapping() {
        lexer.mKEY();
        lexer.mT326();
        lexer.ATTRIBUTE();
    }
}
iff. they are defined together?

BR,
Paul

-- 
Paul Bouché
Voice: +49 30 590080-1284
 
Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, Germany
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRA 88537
WEEE-Reg.-Nr.: DE 52984304

Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH
Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRB 163416



More information about the antlr-interest mailing list