[antlr-interest] Strength of ANTLR lexer

Mon Mar 30 07:54:27 PDT 2009

The standard solution here to define fairly generic tokens in the lexer, then have parser rules that distinguish between them.

mapping : attribute '=' attribute;
key: ALPHA ;
attribute: ALPHA | ALPHANUM;
ALPHA : ('A'..'Z' | 'a'..'z')+;
ALPHANUM : ('A'..'Z' | 'a'..'z' | '0'..'9')+;

You'll likely need predicates in you key and attribute rules.  (Key might need to accept an alphanum if it has no numbers.)  

There's ambiguity between ALPHANUM and ALPHA, so the order of them is important.  

Troy

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of "Paul Bouché (NSN)"
> Sent: Monday, March 30, 2009 10:32 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Strength of ANTLR lexer
> 
> Hello,
> 
> we repeatedly had the following problem. We had overlapping character
> sets for different TOKEN definitions.
> i.e.:
> mapping : KEY '=' ATTRIBUTE;
> KEY : ('A'..'Z' | 'a'..'z')+;
> ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;
> 
> The lexer always generates KEY tokens for abc, but what we actually want
> is ATTRIBUTE tokens. The behvavior is of course wanted in case of
> token definitions for certain keywords etc. But this is not always
> really good.
> 
> Things that can be easily expressed in an EBNF cannot be so easily
> written in ANTLR considering the above example. In the EBNF I could write:
> mapping ::== KEY "=" ATTRIBUTE.
> KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
> ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| ..
> | "Z"| "a" | .. | "z" | "0" | .. | "9")*.
> 
> but to express the same thing in ANTLR because of how the ANTLR lexer
> works I have to write:
> mapping : KEY '=' (ATTRIBUTE | KEY); // really counter intuitive
> KEY : ('A'..'Z' | 'a'..'z')+;
> ATTRIBUTE : ('A'..'Z' | 'a'..'z' | '0'..'9')+;
> 
> The problem is that the lexer is toally indepent of the parser and it
> operates totally without context or structure.  Of course everywhere one
> can find this is how to solve this problem, but imo it is really not a
> grammar problem but an ANTLR limitation. Of course another solution is
> to just emit WORD tokens and check in the parser if the value is valid,
> but why lex again what as already been lexed. Other solutions also
> include building the grammar structure backinto the lexer via syn preds
> which is also not what one likes.
> 
> Any comments or solutions?
> @Ter why was it done this way? Would it not be possible to let the lexer
> be operated by the parser, i.e. something like this:
> // ---- grammar start
> grammar LexerWithContext;
> options {
>     noTokenBuffer = true; // new option?
> }
> mapping : KEY '=' ATTRIBUTE;
> KEY ::== ("A"| .. | "Z"| "a" | .. | "z") ("A"| .. | "Z"| "a" | .. | "z")*.
> ATTRIBUTE ::== ("A"| .. | "Z"| "a" | .. | "z" | "0" | .. | "9") ("A"| ..
> | "Z"| "a" | .. | "z" | "0" | .. | "9")*.
> // ---- grammar stop
> will for the Java target yield:
> 
> public class LexerWithContextParser {
> LexerWithContextLexer lexer;
>     public final mapping() {
>         lexer.mKEY();
>         lexer.mT326();
>         lexer.ATTRIBUTE();
>     }
> }
> iff. they are defined together?
> 
> BR,
> Paul
> 
> --
> Paul Bouché
> Voice: +49 30 590080-1284
> 
> Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin,
> Germany
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRA 88537
> WEEE-Reg.-Nr.: DE 52984304
> 
> Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens
> Networks Management GmbH
> Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
> Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri
> Kivinen
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRB 163416
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address