[antlr-interest] Greedy matching to end of line
Pop Qvarnström
pop.qvarnstrom at gmail.com
Fri Jan 28 01:32:57 PST 2011
Hi,
I cannot reproduce this using your supplied grammar: as long as the required
NEWLINE is in place, your example works just fine. If, however, I do not
provide a newline in the input, I'm hit by a NoViableAltException.
I.e. for input "Comment: NOCHandle John Q. Hacker" I get the result you
describe, while input
"Comment: NOCHandle John Q. Hacker
" works perfectly, which seems reasonable. Same result for NCHandle.
This, of course, if starting from rule asline.
Am I missing something?
Cheers,
Pop
On Fri, Jan 28, 2011 at 8:51 AM, Robert J. Hansen <rjh at sixdemonbag.org>wrote:
> I haven't done any work with lexers and parsers in many years, and
> figured a good way to go about getting re-acquainted would be to find a
> big corpus of text and put together a translator. The corpus I had
> around was the ARIN WHOIS information, which is basically key-value
> coding in a record-based format. Newlines are significant, but other
> whitespace generally isn't.
>
> I'm now running into a brick wall, though, with trying to enable greedy
> matching -- scarfing up everything to end-of-line and returning that
> back as a string. I can *almost* do it, but I'm getting killed on some
> corner cases.
>
> The following is an abbreviated version of the grammar. The bug is
> present in this, but all actions, etc., have been omitted.
>
> =====
> grammar foo;
>
> file : (block|NEWLINE)*;
> block : asblock
> | netblock;
> asblock : asbegin asline* NEWLINE;
> netblock: netbegin netline* NEWLINE;
> netline : n_nh;
> netbegin: 'NetHandle:' words;
> n_nh : 'NOCHandle:' words;
> asline : 'Comment:' words;
> asbegin : 'ASHandle:' words;
> words : word (word)* NEWLINE
> | NEWLINE;
> word : WORD;
> WORD : ~(' '|'\t'|'\r'|'\n')+;
> NEWLINE : '\r'?'\n';
> WS : (' '|'\t') { skip(); };
> =====
>
> ... Now, consider the derivation of the line:
>
> Comment: NOCHandle John Q. Hacker
>
> ... starting from rule asline. asline derives out to 'Comment:' on the
> left, words on the right, and from there straight to NoViableAltException.
>
> However, if I change it to:
>
> Comment: NCHandle John Q. Hacker
>
> ... then it derives successfully.
>
> It appears that when trying to derive the words rule, it sees that rule
> n_nh could also apply and can't decide what derivation to use. But why?
> n_nh is not listed as a child rule of words. How can I fix this so
> that the words rule will grab *everything* to the end of the line?
>
> My second concern: when trying to parse a multi-gig file using a grammar
> much like the above, Java demands it be given absurdly huge heap sizes.
> I am assuming that like most compilers ANTLR has to construct the
> entire tree in memory before it can walk the tree doing various actions:
> however, if there's some way to mitigate the heap memory problem, I
> would be deeply appreciative.
>
> Thank you all for your help!
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list