[antlr-interest] Can subrules be set to 'n-to-m'?

Sat Mar 26 12:58:43 PST 2005

>-----Original Message-----
>From: antlr-interest-bounces at antlr.org 
>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Scott 
>Stanchfield
>Sent: 26 March 2005 19:55
>To: 'Terence Parr'; 'antlr-interest Interest'
>Subject: RE: [antlr-interest] Can subrules be set to 'n-to-m'?
>
>When I first started demoing ANTLR to the folks I worked with 
>at FGM, all the extra parens made it LESS readable.

As an example from the CSS grammar, because it doesn't allow spaces between
some tokens the lexer cannot just discard whitespace which means the parser
rules have to be peppered with (mostly optional) whitespace tokens. So you
get lots of rules like this:-

media
  : MEDIA_SYM (S)* medium ( COMMA (S)* medium )* LBRACE (S)* (ruleset)* "}"
//(S)*
  ;

medium
  : IDENT (S)*
  ;

page
  : PAGE_SYM (S)* (pseudo_page (S)* )?  LBRACE (S)* declaration ( ";" (S)*
declaration )* "}" //(S)*
  ;

pseudo_page
  : ":" IDENT
  ;

ruleset
  : selector ( COMMA (S)* selector )* LBRACE (S)* declaration ( ";" (S)*
declaration )* "}" //(S)*
  ;

Needless to say you have to be very careful where you place those (S)*
sub-rules to avoid non-determinism. Oh, the comments on the ends of the lines
are where the original yacc grammar had what I think are superfluous
whitespace swallowing sub-rules. Actually I'd like to open a discussion on
the best way to handle a language that needs to allow whitespace but only in
certain places. Like I could allow the lexer to drop whitespace but then make
everything where whitespace wasn't allowed into a single custom token, but I
don't know if ANTLR's lexer could handle that.

While we're on the subject of lexers, one of John D. Mitchell's emails on
this subject appears to denigrate the regex as something that's only useful
for simple operations or hacks. That may be so, but I'd kill for a lexer
right now that could handle common left prefixes without requiring syntactic
predicates (like I want a load of exception-based backtracking on every
token). There are some clever things you can do with a LL(k) based lexer but
there are also some very basic things that you can do with lex that are an
absolute nightmare with antlr. Hopefully the DFA-based LL(*) algorithm for
antlr3 will sort most of this. 

richard