[antlr-interest] Parsing HAML - significant and insignificant whitespaces

Tue Jul 21 04:45:44 PDT 2009

Hi Nick,

All these things are options.
But it seems there's no way to dynamycally choose rules in ANTLR. Right?
Like I posted on 15 Jul.

Anyway, thanks everybody for the help.

Cheers.
2009/7/21 Nick Vlassopoulos <nvlassopoulos at gmail.com>

> Hi Dmitiry,
>
> Maybe you're looking for something like the following then?
>
> ------------------------------------------
> grammar Foo;
> options {
>     output = Java;
> }
> @members{
>     int ind;
> }
> prog    :    line*
>     ;
> line    @init{  ind = 0; }
>     :    (ident)* rule LF { System.out.println("Identation " + ind); };
>
> ident    :    WS { ind++; };
> rule    :    RULE;
>
> RULE    :    'rule';
> LF    :    '\n' '\r'?;
> WS    :    ' ';
> ------------------------------------------
>
> Although I am not sure that this will work in all cases and it might need
> some extra cases for tabs and so
> on. Personally, I would probably go for Stephen's solution (i.e. writing a
> simple preprocessor), as I think
> it provides a more "safe" approach.
>
> Nikos.
>
>
> On Mon, Jul 20, 2009 at 2:22 PM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>
>> Hi,
>>
>> Nick I think this will generally work.
>> So I can traverse the AST and decide what to do with the whitespace.
>>
>> But I still believe there should be some option to choose a rule
>> dynamically. Anybody knows that?
>>
>>
>> To Nick: I don't think I want to do preprocessing. ANTR should be capable
>> of doing it.
>>
>>
>> Cheers,
>> Dmitriy.
>>
>> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>>
>>> Hi Dmitiry,
>>>
>>> I am not quite sure about this, but I think that something like the
>>> following
>>>
>>> grammar Foo;
>>> options {output = AST;}
>>>
>>> prog    :    line*
>>>     ;
>>>
>>> line    :    (ident)* rule^ LF;
>>>
>>> ident    :    WS;
>>>
>>>
>>> rule    :    RULE;
>>>
>>> RULE    :    'rule';
>>> LF    :    '\n' '\r'?;
>>> WS    :    ' ';
>>>
>>> would generate an AST where each "rule" comes after a list of "idents"
>>> (spaces in the case above).
>>> So, when you are walking through the tree, you could count the number of
>>> "ident" children before a rule.
>>> I am not sure that this would cover your case (since I am a beginner in
>>> "the ways of ANTLR").
>>>
>>> Hope this helps!
>>>
>>> Nikos
>>>
>>>
>>> On Wed, Jul 15, 2009 at 12:51 AM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Thanks. It shows some useful techniques.
>>>> Main thing there is LEADING_WS.
>>>>
>>>> But it doesn't demonstrate how to choose different rule based on
>>>> identation.
>>>>
>>>> For example something like this would do the job:
>>>> indent    :    LEADING_WS rule ;
>>>> LEADING_WS :   (' ')*
>>>>         {
>>>>           if ( 0 == (getColumn() % sizeOfIndent)) {
>>>>             // We have matched the indent size - need to generate the
>>>> node and follow on the indents
>>>>             // HOW TO 1: Add node to AST here?
>>>>             // HOW TO 2: Execute another rule?
>>>>           } // Otherwise consume the spaces
>>>>         }
>>>>     ;
>>>>
>>>> How those HOW TOs can be done?
>>>>
>>>> Cheers,
>>>> Dmitriy.
>>>> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>>>>
>>>> Sorry for reposting, but I copied the wrong link,
>>>>>
>>>>> http://www.antlr.org/grammar/1078018002577/python.tar.gz
>>>>>
>>>>> Nikos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 14, 2009 at 5:57 PM, Nick Vlassopoulos <
>>>>> nvlassopoulos at gmail.com> wrote:
>>>>>
>>>>>> Hi Dmitiry,
>>>>>>
>>>>>> I am not sure if this is what you are looking for, but you might want
>>>>>> to have a look
>>>>>> on how the python grammar handles identation.
>>>>>> See for example:
>>>>>> http://www.antlr.org/grammar/1200715779785/Python.g
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Nikos
>>>>>>
>>>>>>
>>>>>>   On Tue, Jul 14, 2009 at 5:43 PM, Dmitiry Nagirnyak <
>>>>>> dnagir at gmail.com> wrote:
>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>> I am researching possibility to parse HAML syntax to port it to .NET.
>>>>>>> There is project call NHAML but uses Regular Expressions instead of regular
>>>>>>> parser.
>>>>>>> While it is working great it has certain limitations.
>>>>>>>
>>>>>>> So people start thinking about a real parser. And years ago I did
>>>>>>> some wotks with ANTLR and have chance to revisit it.
>>>>>>>
>>>>>>> My question is about whitespaces.
>>>>>>> In NHAML whitespaces are significant at the beginning of line.
>>>>>>>
>>>>>>> What I would like to have is this (star* for whitespace):
>>>>>>>
>>>>>>> %A
>>>>>>> **%B
>>>>>>> ****%B1
>>>>>>> ****%B2
>>>>>>> **%C
>>>>>>> ****%C1
>>>>>>>
>>>>>>> It would correspond to the tree sam type of tree (A in the root; B,C
>>>>>>> - second level nodes, B1,B22, C1 - third level nodes).
>>>>>>>
>>>>>>> It would be easy if the whitespaces would always be indented at the
>>>>>>> sane number (here 2).
>>>>>>> But this should be configurable. And even more, instead of
>>>>>>> whitespaces there might be tabs. But let's skip this for now.
>>>>>>>
>>>>>>> So grammar like this (just a quick draft) won't satisfy that:
>>>>>>> nhaml    :    line*
>>>>>>>     ;
>>>>>>> line    :    indent? rule
>>>>>>>     ;
>>>>>>> indent    :    WS WS indent? // How to consume different number of
>>>>>>> WSs depending on provided settings?
>>>>>>>     ;
>>>>>>> rule    :    ~WS (~NL)*
>>>>>>>     ;
>>>>>>>
>>>>>>> So the actual question is in rule "indent".
>>>>>>> If I don't know required number of matches of WS during development,
>>>>>>> how can I write grammar for that?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitriy Nagirnyak.
>>>>>>>
>>>>>>>
>>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>>> Unsubscribe:
>>>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090721/4603644e/attachment.html