[antlr-interest] Interesting problem with ANTLR and CSS 2.1

Fri Oct 12 13:23:04 PDT 2007

Hello ANTLR hackers!

I've been wrestling with a parser derived from CSS 2.1(Seen here at:
http://www.w3.org/TR/CSS21/grammar.html ). With a little bit of work and a
little bit of fugding, it parses most CSS files nicely, but now I want to
build an AST from this grammar:

grammar CascadingStylesheet;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
    CDO = '<!--';
    CDC = '-->';
    COMMA = ',';
    LBRACE = '{';
    PLUS = '+';
    GREATER = '>';
    FUNCTION = 'function(';
    HASH = '#';
    INCLUDES = '~=';
    DASHMATCH = '|=';
}

parse    :    stylesheet -> ^(stylesheet); /* I think this sets the "root"
of the AST. */

/*
stylesheet
  : ( '@charset' STRING ';' )?
    S*
    ('/*' ( options { greedy=false; } : . )* '* /')*
    (S|CDO|CDC)* ( import_css (S|CDO|CDC)* )*
    ( ( ruleset | media | page ) (S|CDO|CDC)* )*
  ;
*/

/* Using this rule below because the "rule" above is harder to follow. */
stylesheet
    : (comment_stylesheet|ruleset|media|page)* ;

comment_stylesheet: COMMENT;
comment_ruleset:  COMMENT;
comment_declaration: COMMENT;

COMMENT    :      '/*' (options {greedy=false;} : .)* '*/';

import_css
  : '@import' S*
    (STRING|URI) S* ( medium ( COMMA S* medium)* )? ';' S*
  ;
media
  : '@media' S* medium ( COMMA S* medium )* LBRACE! S* ruleset* '}'! S*
  ;
medium
  : IDENT S*
  ;
page
  : '@page' S* pseudo_page? S*
    LBRACE! S* declaration ( ';'! S* declaration )* '}'! S*
  ;
pseudo_page
  : ':' IDENT
  ;
operator_css
  : '/' S* | COMMA S* | /* empty */
  ;
combinator
  : PLUS S*
  | GREATER S*
  | S*
  ;
unary_operator
  : '-' | PLUS
  ;
property_css
  : IDENT S*
  ;
ruleset
  : selector S* ( COMMA S* selector )* declarations -> selector ( COMMA
selector )* declarations;

declarations
    : LBRACE! S* comment_ruleset* S* declaration (  ';'! S* declaration
comment_declaration* )* '}'! S*;
selector
  : simple_selector ( combinator simple_selector )*
  ;
simple_selector
  : element_name ( HASH IDENT| class_css | attrib | pseudo )*
  | ( HASH IDENT| class_css | attrib | pseudo )+
  ;
class_css : '.' IDENT;
element_name
  : IDENT | '*'
  ;
attrib
  : '[' S* IDENT S* ( ( '=' | INCLUDES | DASHMATCH ) S*
    ( IDENT | STRING ) S* )? ']'
  ;
pseudo
  : ':' ( IDENT | FUNCTION S* IDENT? S* ')' )
  ;
declaration
  : property_css ':' S* expr_css prio?
  | /* empty */
  ;
prio
  : '!' S*
  ;
expr_css
  : term ( operator_css term )*
  ;
term
  : unary_operator?
    ( NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE S* |
      TIME S* | FREQ S* )
  | STRING S* | IDENT S* | URI S* | hexcolor | function_css
  ;
function_css
  : FUNCTION S* expr_css ')' S*
  ;
/*
 * There is a constraint on the color that it must
 * have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
 * after the "#"; e.g., "#000" is OK, but "#abcd" is not.
 */
hexcolor
  : HASH S*
  ;

NUMBER    :     '0'..'9'+
    |    '0'..'9'* '.' '0'..'9'+;
PERCENTAGE
    :    NUMBER '%';
LENGTH    :    NUMBER ('in'|'cm'|'mm'|'px'|'pt'|'pc');
EMS    :    NUMBER 'em';
EXS    :    NUMBER 'ex';
ANGLE    :    NUMBER ('deg'|'rad'|'grad');
TIME    :    NUMBER ('sec'|'msec');
FREQ    :    NUMBER ('Hz'|'kHz');

S    :    ' '|'\n'|'\t'|'\r'|'\f' {$channel=HIDDEN;};
IDENT    :    ('a'..'z'|'A'..'Z'|'-') ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'_')*;
STRING    :    '\'' ( options {greedy=false;} : . )* '\''
    |    '"' ( options {greedy=false;} : . )* '"'
    ;
URI    :    'url(' ( options {greedy=false;} : . )* ')';

/* end of CascadingStylesheet.g */

What I'm running into is while the grammar validates, when it parses a
simple stylesheet like this:

DIV.example { color: green; }

I'll get a runtime error:

"more than one node as root"

What about CSS is confusing ANTLR?  Why can't I get a tree that looks like
the diagram in ANTLRworks by default?

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071012/33e33711/attachment-0001.html