[antlr-interest] Error 206 Recursion woes

Kyle Ferrio kferrio at gmail.com
Wed Apr 11 19:13:14 PDT 2012


Hi again,

I don't claim to know your intended language, but you might discover a more
elegant approach.  I'm not yet convinced that whitespace is semantic in
your language.

Here is a test: suppose the input stream comprises text recognized by the
lexer as any two tokens T1 and T2 chosen from your language definition and
separated by a single non-empty unit (e.g. character) W of whitespace.
Note the production P of the parser acting on the token stream.  Now
replace W with W' also not null but orthographically distinct from W (e.g.
make it two whitespaces, W'=WW, but it could also be W plus tab or any
other whitespace), run the lexer and parse to produce P'.  If P=P' then
whitespace is not semantic.  (Note that the test requires considering all
valid ordered pairs of T1 and T2 and the proof proceeds by induction.
Details omitted.)

If whitespace really is semantic in your language, then you have to work
through all the possibile meanings and hope you can reduce this to a
manageable set of rules.  Typically in such languages not all whitespace is
equivalent.  Ask anyone who has wrangled with makefiles or certain HTML
templating engines.  :)  Not fun.

If whitespace is not semantic -- and I still think that's the case here --
then you can just throw whitespace away.  Really.  Remember, the parser
just gets a sequence of tokens.  Nothing is needed to delimit tokens.

If I've truly missed the intent of your language, then you might have a
look at a parser for Python and see how semantic whitespace is handled
there.

Kyle
On Apr 11, 2012 3:28 PM, "Tangeleno" <tangeleno at gmail.com> wrote:

> Thanks for the tips I figured out what the issue was, it was 'string    :
>    STRING|.*;'
>
> I got that part working but now I'm having another issue... at times I
> need to pay attention to whitespace as they are used to delimit arguments
> at certain points in time.
> But I can't seem to find a way to have the parser switch channels over to
> the hidden channel to check for whitespaces. The only option seems to be to
> take WS off the hidden channel but then for the parts where I don't care
> about whitespace I would have to add WS* all over the place to prevent
> 'UnwantedTokenExceptions'
>
> Attached is an example grammar.
>
> Kyle Ferrio wrote:
>
>> Hi.
>>
>> Recursion (both in ANTLR grammars and in the target runtimes) are topics
>> of interest to me.  So I looked at what you wrote.  It appears that your
>> partial grammar spec includes some rules which are not referenced and, more
>> notably, omits some necessary rules.  Consequently, whatever errors I might
>> derive from your grammar may not actually be interesting to you.  So I took
>> a different view.
>>
>> I dashed out two grammars -- not identical but substantially similar to
>> each other and remotely similar to yours -- one for antlr v3 and one for
>> artlr v4 [1].  I've attached the grammars with the caveat that neither is a
>> model to emulate on the whole.  These are toys, absolutely not
>> production-worthy.  But thee toys do highlight a few details which you
>> might consider, such as:
>>
>>  - It is useful to ponder what tokens a parser really needs and hence
>> which lexer rules should be mere fragments.
>>  - Using whitespace as a token may come to tears in your grammar.  And WS
>> does not appear to be semantic in your language.
>>  - If all/some WS is truly semantic in your language (as in Python or R)
>> you might do well to filter before lexing, viz. "form fits function."
>>
>> Both toy grammars -- each processed with and run against its respective
>> version of antlr -- parsed the following input as expected without error:
>>
>> *   ${${${${${${${${${${${${${${${**${Test15}}}}}}}}}}}}}}}}*
>>
>> (I used the attached Main.java for antlr v3.4 and
>> org.antlr.v4.runtime.misc.**TestRig for antlr v4.)
>>
>> I hope you get something out of this even if it does not answer your
>> question directly.  Sorry to post and run, but checking email may be spotty
>> for me the rest of the week.
>>
>> Good luck!
>>
>> Kyle
>>
>> [1] I probably could have written one grammar for both versions of the
>> antlr tool, but already I've been spoiled by how tolerant Honey Badger (aka
>> antlr v4) is of my misteps.  And it's been a long day.  :)
>>
>>
>> On Tue, Apr 10, 2012 at 7:19 PM, Tangeleno <tangeleno at gmail.com <mailto:
>> tangeleno at gmail.com>> wrote:
>>
>>    The error is "Ln17:7 Alternative 1: after matching input such as '$'
>>    '{''$' '{''$' '{''$' '{''$' '{' the decision cannot predict what comes
>>    next due to recursion overflow to id from accessor and to object data
>>    from id and to accessor from objectData"
>>
>>     From what I can gather from the book pg 281 is the closest I can find
>>    to this problem it should be a warning and not an error. The
>>    problem is
>>    coming about when I add the rule objectData as an alternative to the
>>    param rule, which I really don't understand as because it's
>>    complaining
>>    about id accessor and objectData...
>>
>>    So my question is what am I doing wrong and what can I do to fix it
>>    while still allowing nested objectDatas?
>>    http://privatepaste.com/**6611be2380<http://privatepaste.com/6611be2380>is an example of what can happen in
>>    this silly language and 6 levels is by no means the cap...
>>
>>    objectData
>>        :    '$' '{'accessor objectMember*'}'
>>        ;
>>    objectMember
>>        :    ('.'|':') accessor;
>>
>>    accessor:    id indexer* typeCast*;
>>
>>    id    :    (ID|objectData);
>>
>>    indexer    :    '[' commaParams ']'
>>        ;
>>    commaParams
>>        :    param (',' param)*
>>        ;
>>
>>    param    :    objectData|number|string
>>        ;
>>    spaceParams
>>        :    param (WS param)* NewLine
>>        ;
>>    typeCast:    '(' id ')';
>>    number    :    INT|FLOAT;
>>    string    :    STRING|.*;
>>
>>    List: http://www.antlr.org/mailman/**listinfo/antlr-interest<http://www.antlr.org/mailman/listinfo/antlr-interest>
>>    Unsubscribe:
>>    http://www.antlr.org/mailman/**options/antlr-interest/your-**
>> email-address<http://www.antlr.org/mailman/options/antlr-interest/your-email-address>
>>
>>
>>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>


More information about the antlr-interest mailing list