[antlr-interest] Parsing HAML - significant and insignificant whitespaces
Dmitiry Nagirnyak
dnagir at gmail.com
Tue Jul 21 04:45:44 PDT 2009
Hi Nick,
All these things are options.
But it seems there's no way to dynamycally choose rules in ANTLR. Right?
Like I posted on 15 Jul.
Anyway, thanks everybody for the help.
Cheers.
2009/7/21 Nick Vlassopoulos <nvlassopoulos at gmail.com>
> Hi Dmitiry,
>
> Maybe you're looking for something like the following then?
>
> ------------------------------------------
> grammar Foo;
> options {
> output = Java;
> }
> @members{
> int ind;
> }
> prog : line*
> ;
> line @init{ ind = 0; }
> : (ident)* rule LF { System.out.println("Identation " + ind); };
>
> ident : WS { ind++; };
> rule : RULE;
>
> RULE : 'rule';
> LF : '\n' '\r'?;
> WS : ' ';
> ------------------------------------------
>
> Although I am not sure that this will work in all cases and it might need
> some extra cases for tabs and so
> on. Personally, I would probably go for Stephen's solution (i.e. writing a
> simple preprocessor), as I think
> it provides a more "safe" approach.
>
> Nikos.
>
>
> On Mon, Jul 20, 2009 at 2:22 PM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>
>> Hi,
>>
>> Nick I think this will generally work.
>> So I can traverse the AST and decide what to do with the whitespace.
>>
>> But I still believe there should be some option to choose a rule
>> dynamically. Anybody knows that?
>>
>>
>> To Nick: I don't think I want to do preprocessing. ANTR should be capable
>> of doing it.
>>
>>
>> Cheers,
>> Dmitriy.
>>
>> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>>
>>> Hi Dmitiry,
>>>
>>> I am not quite sure about this, but I think that something like the
>>> following
>>>
>>> grammar Foo;
>>> options {output = AST;}
>>>
>>> prog : line*
>>> ;
>>>
>>> line : (ident)* rule^ LF;
>>>
>>> ident : WS;
>>>
>>>
>>> rule : RULE;
>>>
>>> RULE : 'rule';
>>> LF : '\n' '\r'?;
>>> WS : ' ';
>>>
>>> would generate an AST where each "rule" comes after a list of "idents"
>>> (spaces in the case above).
>>> So, when you are walking through the tree, you could count the number of
>>> "ident" children before a rule.
>>> I am not sure that this would cover your case (since I am a beginner in
>>> "the ways of ANTLR").
>>>
>>> Hope this helps!
>>>
>>> Nikos
>>>
>>>
>>> On Wed, Jul 15, 2009 at 12:51 AM, Dmitiry Nagirnyak <dnagir at gmail.com>wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Thanks. It shows some useful techniques.
>>>> Main thing there is LEADING_WS.
>>>>
>>>> But it doesn't demonstrate how to choose different rule based on
>>>> identation.
>>>>
>>>> For example something like this would do the job:
>>>> indent : LEADING_WS rule ;
>>>> LEADING_WS : (' ')*
>>>> {
>>>> if ( 0 == (getColumn() % sizeOfIndent)) {
>>>> // We have matched the indent size - need to generate the
>>>> node and follow on the indents
>>>> // HOW TO 1: Add node to AST here?
>>>> // HOW TO 2: Execute another rule?
>>>> } // Otherwise consume the spaces
>>>> }
>>>> ;
>>>>
>>>> How those HOW TOs can be done?
>>>>
>>>> Cheers,
>>>> Dmitriy.
>>>> 2009/7/15 Nick Vlassopoulos <nvlassopoulos at gmail.com>
>>>>
>>>> Sorry for reposting, but I copied the wrong link,
>>>>>
>>>>> http://www.antlr.org/grammar/1078018002577/python.tar.gz
>>>>>
>>>>> Nikos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 14, 2009 at 5:57 PM, Nick Vlassopoulos <
>>>>> nvlassopoulos at gmail.com> wrote:
>>>>>
>>>>>> Hi Dmitiry,
>>>>>>
>>>>>> I am not sure if this is what you are looking for, but you might want
>>>>>> to have a look
>>>>>> on how the python grammar handles identation.
>>>>>> See for example:
>>>>>> http://www.antlr.org/grammar/1200715779785/Python.g
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Nikos
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 14, 2009 at 5:43 PM, Dmitiry Nagirnyak <
>>>>>> dnagir at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am researching possibility to parse HAML syntax to port it to .NET.
>>>>>>> There is project call NHAML but uses Regular Expressions instead of regular
>>>>>>> parser.
>>>>>>> While it is working great it has certain limitations.
>>>>>>>
>>>>>>> So people start thinking about a real parser. And years ago I did
>>>>>>> some wotks with ANTLR and have chance to revisit it.
>>>>>>>
>>>>>>> My question is about whitespaces.
>>>>>>> In NHAML whitespaces are significant at the beginning of line.
>>>>>>>
>>>>>>> What I would like to have is this (star* for whitespace):
>>>>>>>
>>>>>>> %A
>>>>>>> **%B
>>>>>>> ****%B1
>>>>>>> ****%B2
>>>>>>> **%C
>>>>>>> ****%C1
>>>>>>>
>>>>>>> It would correspond to the tree sam type of tree (A in the root; B,C
>>>>>>> - second level nodes, B1,B22, C1 - third level nodes).
>>>>>>>
>>>>>>> It would be easy if the whitespaces would always be indented at the
>>>>>>> sane number (here 2).
>>>>>>> But this should be configurable. And even more, instead of
>>>>>>> whitespaces there might be tabs. But let's skip this for now.
>>>>>>>
>>>>>>> So grammar like this (just a quick draft) won't satisfy that:
>>>>>>> nhaml : line*
>>>>>>> ;
>>>>>>> line : indent? rule
>>>>>>> ;
>>>>>>> indent : WS WS indent? // How to consume different number of
>>>>>>> WSs depending on provided settings?
>>>>>>> ;
>>>>>>> rule : ~WS (~NL)*
>>>>>>> ;
>>>>>>>
>>>>>>> So the actual question is in rule "indent".
>>>>>>> If I don't know required number of matches of WS during development,
>>>>>>> how can I write grammar for that?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitriy Nagirnyak.
>>>>>>>
>>>>>>>
>>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>>> Unsubscribe:
>>>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090721/4603644e/attachment.html
More information about the antlr-interest
mailing list