[antlr-interest] Parsing HAML - significant and insignificant whitespaces

Nick Vlassopoulos nvlassopoulos at gmail.com
Wed Jul 15 02:46:14 PDT 2009


Hi Dmitiry,

I am not quite sure about this, but I think that something like the
following

grammar Foo;
options {output = AST;}

prog    :    line*
    ;

line    :    (ident)* rule^ LF;

ident    :    WS;


rule    :    RULE;

RULE    :    'rule';
LF    :    '\n' '\r'?;
WS    :    ' ';

would generate an AST where each "rule" comes after a list of "idents"
(spaces in the case above).
So, when you are walking through the tree, you could count the number of
"ident" children before a rule.
I am not sure that this would cover your case (since I am a beginner in "the
ways of ANTLR").

Hope this helps!

Nikos

On Wed, Jul 15, 2009 at 12:51 AM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:

> Hi Nick,
>
> Thanks. It shows some useful techniques.
> Main thing there is LEADING_WS.
>
> But it doesn't demonstrate how to choose different rule based on
> identation.
>
> For example something like this would do the job:
> indent    :    LEADING_WS rule ;
> LEADING_WS :   (' ')*
>         {
>           if ( 0 == (getColumn() % sizeOfIndent)) {
>             // We have matched the indent size - need to generate the node
> and follow on the indents
>             // HOW TO 1: Add node to AST here?
>             // HOW TO 2: Execute another rule?
>           } // Otherwise consume the spaces
>         }
>     ;
>
> How those HOW TOs can be done?
>
> Cheers,
> Dmitriy.
> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>
> Sorry for reposting, but I copied the wrong link,
>>
>> http://www.antlr.org/grammar/1078018002577/python.tar.gz
>>
>> Nikos
>>
>>
>>
>>
>> On Tue, Jul 14, 2009 at 5:57 PM, Nick Vlassopoulos <
>> nvlassopoulos at gmail.com> wrote:
>>
>>> Hi Dmitiry,
>>>
>>> I am not sure if this is what you are looking for, but you might want to
>>> have a look
>>> on how the python grammar handles identation.
>>> See for example:
>>> http://www.antlr.org/grammar/1200715779785/Python.g
>>>
>>> Best Regards,
>>>
>>> Nikos
>>>
>>>
>>>   On Tue, Jul 14, 2009 at 5:43 PM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>>>
>>>>  Hi,
>>>>
>>>> I am researching possibility to parse HAML syntax to port it to .NET.
>>>> There is project call NHAML but uses Regular Expressions instead of regular
>>>> parser.
>>>> While it is working great it has certain limitations.
>>>>
>>>> So people start thinking about a real parser. And years ago I did some
>>>> wotks with ANTLR and have chance to revisit it.
>>>>
>>>> My question is about whitespaces.
>>>> In NHAML whitespaces are significant at the beginning of line.
>>>>
>>>> What I would like to have is this (star* for whitespace):
>>>>
>>>> %A
>>>> **%B
>>>> ****%B1
>>>> ****%B2
>>>> **%C
>>>> ****%C1
>>>>
>>>> It would correspond to the tree sam type of tree (A in the root; B,C -
>>>> second level nodes, B1,B22, C1 - third level nodes).
>>>>
>>>> It would be easy if the whitespaces would always be indented at the sane
>>>> number (here 2).
>>>> But this should be configurable. And even more, instead of whitespaces
>>>> there might be tabs. But let's skip this for now.
>>>>
>>>> So grammar like this (just a quick draft) won't satisfy that:
>>>> nhaml    :    line*
>>>>     ;
>>>> line    :    indent? rule
>>>>     ;
>>>> indent    :    WS WS indent? // How to consume different number of WSs
>>>> depending on provided settings?
>>>>     ;
>>>> rule    :    ~WS (~NL)*
>>>>     ;
>>>>
>>>> So the actual question is in rule "indent".
>>>> If I don't know required number of matches of WS during development, how
>>>> can I write grammar for that?
>>>>
>>>> Cheers,
>>>> Dmitriy Nagirnyak.
>>>>
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe:
>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090715/bea1d795/attachment.html 


More information about the antlr-interest mailing list