[antlr-interest] Strength of ANTLR lexer

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Mon Mar 30 08:05:35 PDT 2009


Daniels, Troy (US SSA) wrote:
> The standard solution here to define fairly generic tokens in the lexer, then have parser rules that distinguish between them.
> 
> mapping : attribute '=' attribute;
> key: ALPHA ;
> attribute: ALPHA | ALPHANUM;
> ALPHA : ('A'..'Z' | 'a'..'z')+;
> ALPHANUM : ('A'..'Z' | 'a'..'z' | '0'..'9')+;
> 
> You'll likely need predicates in you key and attribute rules.  (Key might need to accept an alphanum if it has no numbers.)  
> 
> There's ambiguity between ALPHANUM and ALPHA, so the order of them is important.  

Which could be improved with refactoring:

fragment ALPHANUM : ;

ALPHA : ('A'..'Z' | 'a'..'z' | '0'..'9' {$type=ALPHANUM} )+;

(Might be able to make that ALPHA rule more efficient)

Sam


>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of "Paul Bouché (NSN)"
>> Sent: Monday, March 30, 2009 10:32 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Strength of ANTLR lexer
>>
>> Hello,
>>
>> we repeatedly had the following problem. We had overlapping character
>> sets for different TOKEN definitions.
>> i.e.:
>> mapping : KEY '=' ATTRIBUTE;
>> KEY : ('A'..'Z' | 'a'..'z')+;
>> ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;
>>
>> The lexer always generates KEY tokens for abc, but what we actually want
>> is ATTRIBUTE tokens. The behvavior is of course wanted in case of
>> token definitions for certain keywords etc. But this is not always
>> really good.
>>
>> Things that can be easily expressed in an EBNF cannot be so easily
>> written in ANTLR considering the above example. In the EBNF I could write:
>> mapping ::== KEY "=" ATTRIBUTE.
>> KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
>> ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| ..
>> | "Z"| "a" | .. | "z" | "0" | .. | "9")*.
>>
>> but to express the same thing in ANTLR because of how the ANTLR lexer
>> works I have to write:
>> mapping : KEY '=' (ATTRIBUTE | KEY); // really counter intuitive
>> KEY : ('A'..'Z' | 'a'..'z')+;
>> ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;
>>
>> The problem is that the lexer is toally indepent of the parser and it
>> operates totally without context or structure.  Of course everywhere one
>> can find this is how to solve this problem, but imo it is really not a
>> grammar problem but an ANTLR limitation. Of course another solution is
>> to just emit WORD tokens and check in the parser if the value is valid,
>> but why lex again what as already been lexed. Other solutions also
>> include building the grammar structure backinto the lexer via syn preds
>> which is also not what one likes.
>>
>> Any comments or solutions?
>> @Ter why was it done this way? Would it not be possible to let the lexer
>> be operated by the parser, i.e. something like this:
>> // ---- grammar start
>> grammar LexerWithContext;
>> options {
>>     noTokenBuffer = true; // new option?
>> }
>> mapping : KEY '=' ATTRIBUTE;
>> KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
>> ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| ..
>> | "Z"| "a" | .. | "z" | "0" | .. | "9")*.
>> // ---- grammar stop
>> will for the Java target yield:
>>
>> public class LexerWithContextParser {
>> LexerWithContextLexer lexer;
>>     public final mapping() {
>>         lexer.mKEY();
>>         lexer.mT326();
>>         lexer.ATTRIBUTE();
>>     }
>> }
>> iff. they are defined together?
>>
>> BR,
>> Paul
>>
>> --
>> Paul Bouché
>> Voice: +49 30 590080-1284
>>
>> Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin,
>> Germany
>> Sitz der Gesellschaft: München / Registered office: Munich
>> Registergericht: München / Commercial registry: Munich, HRA 88537
>> WEEE-Reg.-Nr.: DE 52984304
>>
>> Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens
>> Networks Management GmbH
>> Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
>> Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri
>> Kivinen
>> Sitz der Gesellschaft: München / Registered office: Munich
>> Registergericht: München / Commercial registry: Munich, HRB 163416
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list