[antlr-interest] Parsing HAML - significant and insignificant whitespaces

Dmitiry Nagirnyak dnagir at gmail.com
Tue Jul 14 09:43:54 PDT 2009


Hi,

I am researching possibility to parse HAML syntax to port it to .NET. There
is project call NHAML but uses Regular Expressions instead of regular
parser.
While it is working great it has certain limitations.

So people start thinking about a real parser. And years ago I did some wotks
with ANTLR and have chance to revisit it.

My question is about whitespaces.
In NHAML whitespaces are significant at the beginning of line.

What I would like to have is this (star* for whitespace):

%A
**%B
****%B1
****%B2
**%C
****%C1

It would correspond to the tree sam type of tree (A in the root; B,C -
second level nodes, B1,B22, C1 - third level nodes).

It would be easy if the whitespaces would always be indented at the sane
number (here 2).
But this should be configurable. And even more, instead of whitespaces there
might be tabs. But let's skip this for now.

So grammar like this (just a quick draft) won't satisfy that:
nhaml    :    line*
    ;
line    :    indent? rule
    ;
indent    :    WS WS indent? // How to consume different number of WSs
depending on provided settings?
    ;
rule    :    ~WS (~NL)*
    ;

So the actual question is in rule "indent".
If I don't know required number of matches of WS during development, how can
I write grammar for that?

Cheers,
Dmitriy Nagirnyak.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090715/7ea49af4/attachment.html 


More information about the antlr-interest mailing list