[antlr-interest] Parsing HAML - significant and insignificant whitespaces

Nick Vlassopoulos nvlassopoulos at gmail.com
Mon Jul 20 10:32:34 PDT 2009


Hi Dmitiry,

Maybe you're looking for something like the following then?

------------------------------------------
grammar Foo;
options {
    output = Java;
}
@members{
    int ind;
}
prog    :    line*
    ;
line    @init{  ind = 0; }
    :    (ident)* rule LF { System.out.println("Identation " + ind); };

ident    :    WS { ind++; };
rule    :    RULE;

RULE    :    'rule';
LF    :    '\n' '\r'?;
WS    :    ' ';
------------------------------------------

Although I am not sure that this will work in all cases and it might need
some extra cases for tabs and so
on. Personally, I would probably go for Stephen's solution (i.e. writing a
simple preprocessor), as I think
it provides a more "safe" approach.

Nikos.

On Mon, Jul 20, 2009 at 2:22 PM, Dmitiry Nagirnyak <dnagir at gmail.com> wrote:

> Hi,
>
> Nick I think this will generally work.
> So I can traverse the AST and decide what to do with the whitespace.
>
> But I still believe there should be some option to choose a rule
> dynamically. Anybody knows that?
>
>
> To Nick: I don't think I want to do preprocessing. ANTR should be capable
> of doing it.
>
>
> Cheers,
> Dmitriy.
>
> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>
>> Hi Dmitiry,
>>
>> I am not quite sure about this, but I think that something like the
>> following
>>
>> grammar Foo;
>> options {output = AST;}
>>
>> prog    :    line*
>>     ;
>>
>> line    :    (ident)* rule^ LF;
>>
>> ident    :    WS;
>>
>>
>> rule    :    RULE;
>>
>> RULE    :    'rule';
>> LF    :    '\n' '\r'?;
>> WS    :    ' ';
>>
>> would generate an AST where each "rule" comes after a list of "idents"
>> (spaces in the case above).
>> So, when you are walking through the tree, you could count the number of
>> "ident" children before a rule.
>> I am not sure that this would cover your case (since I am a beginner in
>> "the ways of ANTLR").
>>
>> Hope this helps!
>>
>> Nikos
>>
>>
>> On Wed, Jul 15, 2009 at 12:51 AM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>>
>>> Hi Nick,
>>>
>>> Thanks. It shows some useful techniques.
>>> Main thing there is LEADING_WS.
>>>
>>> But it doesn't demonstrate how to choose different rule based on
>>> identation.
>>>
>>> For example something like this would do the job:
>>> indent    :    LEADING_WS rule ;
>>> LEADING_WS :   (' ')*
>>>         {
>>>           if ( 0 == (getColumn() % sizeOfIndent)) {
>>>             // We have matched the indent size - need to generate the
>>> node and follow on the indents
>>>             // HOW TO 1: Add node to AST here?
>>>             // HOW TO 2: Execute another rule?
>>>           } // Otherwise consume the spaces
>>>         }
>>>     ;
>>>
>>> How those HOW TOs can be done?
>>>
>>> Cheers,
>>> Dmitriy.
>>> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>>>
>>> Sorry for reposting, but I copied the wrong link,
>>>>
>>>> http://www.antlr.org/grammar/1078018002577/python.tar.gz
>>>>
>>>> Nikos
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 14, 2009 at 5:57 PM, Nick Vlassopoulos <
>>>> nvlassopoulos at gmail.com> wrote:
>>>>
>>>>> Hi Dmitiry,
>>>>>
>>>>> I am not sure if this is what you are looking for, but you might want
>>>>> to have a look
>>>>> on how the python grammar handles identation.
>>>>> See for example:
>>>>> http://www.antlr.org/grammar/1200715779785/Python.g
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Nikos
>>>>>
>>>>>
>>>>>   On Tue, Jul 14, 2009 at 5:43 PM, Dmitiry Nagirnyak <dnagir at gmail.com
>>>>> > wrote:
>>>>>
>>>>>>  Hi,
>>>>>>
>>>>>> I am researching possibility to parse HAML syntax to port it to .NET.
>>>>>> There is project call NHAML but uses Regular Expressions instead of regular
>>>>>> parser.
>>>>>> While it is working great it has certain limitations.
>>>>>>
>>>>>> So people start thinking about a real parser. And years ago I did some
>>>>>> wotks with ANTLR and have chance to revisit it.
>>>>>>
>>>>>> My question is about whitespaces.
>>>>>> In NHAML whitespaces are significant at the beginning of line.
>>>>>>
>>>>>> What I would like to have is this (star* for whitespace):
>>>>>>
>>>>>> %A
>>>>>> **%B
>>>>>> ****%B1
>>>>>> ****%B2
>>>>>> **%C
>>>>>> ****%C1
>>>>>>
>>>>>> It would correspond to the tree sam type of tree (A in the root; B,C -
>>>>>> second level nodes, B1,B22, C1 - third level nodes).
>>>>>>
>>>>>> It would be easy if the whitespaces would always be indented at the
>>>>>> sane number (here 2).
>>>>>> But this should be configurable. And even more, instead of whitespaces
>>>>>> there might be tabs. But let's skip this for now.
>>>>>>
>>>>>> So grammar like this (just a quick draft) won't satisfy that:
>>>>>> nhaml    :    line*
>>>>>>     ;
>>>>>> line    :    indent? rule
>>>>>>     ;
>>>>>> indent    :    WS WS indent? // How to consume different number of WSs
>>>>>> depending on provided settings?
>>>>>>     ;
>>>>>> rule    :    ~WS (~NL)*
>>>>>>     ;
>>>>>>
>>>>>> So the actual question is in rule "indent".
>>>>>> If I don't know required number of matches of WS during development,
>>>>>> how can I write grammar for that?
>>>>>>
>>>>>> Cheers,
>>>>>> Dmitriy Nagirnyak.
>>>>>>
>>>>>>
>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>> Unsubscribe:
>>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090720/907805b4/attachment.html 


More information about the antlr-interest mailing list