[antlr-interest] Interesting problem with ANTLR and CSS 2.1
Simon Janes
simon.janes at gmail.com
Fri Oct 12 13:23:04 PDT 2007
Hello ANTLR hackers!
I've been wrestling with a parser derived from CSS 2.1(Seen here at:
http://www.w3.org/TR/CSS21/grammar.html ). With a little bit of work and a
little bit of fugding, it parses most CSS files nicely, but now I want to
build an AST from this grammar:
grammar CascadingStylesheet;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
CDO = '<!--';
CDC = '-->';
COMMA = ',';
LBRACE = '{';
PLUS = '+';
GREATER = '>';
FUNCTION = 'function(';
HASH = '#';
INCLUDES = '~=';
DASHMATCH = '|=';
}
parse : stylesheet -> ^(stylesheet); /* I think this sets the "root"
of the AST. */
/*
stylesheet
: ( '@charset' STRING ';' )?
S*
('/*' ( options { greedy=false; } : . )* '* /')*
(S|CDO|CDC)* ( import_css (S|CDO|CDC)* )*
( ( ruleset | media | page ) (S|CDO|CDC)* )*
;
*/
/* Using this rule below because the "rule" above is harder to follow. */
stylesheet
: (comment_stylesheet|ruleset|media|page)* ;
comment_stylesheet: COMMENT;
comment_ruleset: COMMENT;
comment_declaration: COMMENT;
COMMENT : '/*' (options {greedy=false;} : .)* '*/';
import_css
: '@import' S*
(STRING|URI) S* ( medium ( COMMA S* medium)* )? ';' S*
;
media
: '@media' S* medium ( COMMA S* medium )* LBRACE! S* ruleset* '}'! S*
;
medium
: IDENT S*
;
page
: '@page' S* pseudo_page? S*
LBRACE! S* declaration ( ';'! S* declaration )* '}'! S*
;
pseudo_page
: ':' IDENT
;
operator_css
: '/' S* | COMMA S* | /* empty */
;
combinator
: PLUS S*
| GREATER S*
| S*
;
unary_operator
: '-' | PLUS
;
property_css
: IDENT S*
;
ruleset
: selector S* ( COMMA S* selector )* declarations -> selector ( COMMA
selector )* declarations;
declarations
: LBRACE! S* comment_ruleset* S* declaration ( ';'! S* declaration
comment_declaration* )* '}'! S*;
selector
: simple_selector ( combinator simple_selector )*
;
simple_selector
: element_name ( HASH IDENT| class_css | attrib | pseudo )*
| ( HASH IDENT| class_css | attrib | pseudo )+
;
class_css : '.' IDENT;
element_name
: IDENT | '*'
;
attrib
: '[' S* IDENT S* ( ( '=' | INCLUDES | DASHMATCH ) S*
( IDENT | STRING ) S* )? ']'
;
pseudo
: ':' ( IDENT | FUNCTION S* IDENT? S* ')' )
;
declaration
: property_css ':' S* expr_css prio?
| /* empty */
;
prio
: '!' S*
;
expr_css
: term ( operator_css term )*
;
term
: unary_operator?
( NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE S* |
TIME S* | FREQ S* )
| STRING S* | IDENT S* | URI S* | hexcolor | function_css
;
function_css
: FUNCTION S* expr_css ')' S*
;
/*
* There is a constraint on the color that it must
* have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
* after the "#"; e.g., "#000" is OK, but "#abcd" is not.
*/
hexcolor
: HASH S*
;
NUMBER : '0'..'9'+
| '0'..'9'* '.' '0'..'9'+;
PERCENTAGE
: NUMBER '%';
LENGTH : NUMBER ('in'|'cm'|'mm'|'px'|'pt'|'pc');
EMS : NUMBER 'em';
EXS : NUMBER 'ex';
ANGLE : NUMBER ('deg'|'rad'|'grad');
TIME : NUMBER ('sec'|'msec');
FREQ : NUMBER ('Hz'|'kHz');
S : ' '|'\n'|'\t'|'\r'|'\f' {$channel=HIDDEN;};
IDENT : ('a'..'z'|'A'..'Z'|'-') ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'_')*;
STRING : '\'' ( options {greedy=false;} : . )* '\''
| '"' ( options {greedy=false;} : . )* '"'
;
URI : 'url(' ( options {greedy=false;} : . )* ')';
/* end of CascadingStylesheet.g */
What I'm running into is while the grammar validates, when it parses a
simple stylesheet like this:
DIV.example { color: green; }
I'll get a runtime error:
"more than one node as root"
What about CSS is confusing ANTLR? Why can't I get a tree that looks like
the diagram in ANTLRworks by default?
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071012/33e33711/attachment-0001.html
More information about the antlr-interest
mailing list