[antlr-interest] Followed by (PEG style predicates)

Petteri Räty betelgeuse at gentoo.org
Sat Apr 25 15:30:31 PDT 2009


For Gentoo package dependencies I need to parse the following:
"A package name may contain any of the characters [A-Za-z0-9+_-]. It
must not begin with a hyphen, and must not end in a hyphen followed by
one or more digits."

The package name is followed by the version specification so basically
unless we first find the whole atom and then start parsing from end to
start we need to parse package name until version starts. Here's my two
approaches but neither is able to pass my tests perfectly:

pn	:	pn_start
	|	pn_start pn_middle* pn_part {!pn_end.matcher($pn.text).matches()}?;

// https://wincent.com/wiki/PEG-style_predicates_in_ANTLR
pn_middle:
	(pn_part ((pn_follows)=>{false}?| ) )=> pn_part;
pn_start : name_part|'+'|'_';
pn_part:  name_part|'+'|'_'|'-';

pn_follows
:
	 {$versioned_dep.size() > 0}?=> version_spec (WS|EOF)
	 | {$versioned_dep.size() == 0}?=> (WS|EOF);

Here antlr doesn't seem to generate anything for the predicate thing:

    public final void pn_middle() throws RecognitionException {
        try {
            // Depend.g:80:10: ( ( pn_part ( ( pn_follows )=>{...}? | )
)=> pn_part )
            // Depend.g:81:2: ( pn_part ( ( pn_follows )=>{...}? | ) )=>
pn_part
            {
            pushFollow(FOLLOW_pn_part_in_pn_middle364);
            pn_part();

            state._fsp--;
            if (state.failed) return ;

            }

        }
        catch (RecognitionException re) {
            reportError(re);
            recover(input,re);
        }
        finally {
        }
        return ;
    }

My other approach is trying to make use of options greedy:

pn	:	pn_start pn_end? {!pn_end.matcher($pn.text).matches()}?;

pn_end	:	(options { greedy=false;} : pn_part)* (pn_part pn_follows)=>
pn_part;

But this does not pass the testsuite either. The version with greedy and
the testsuite is attached. What approach do you recommend for parsing this?

Regards,
Petteri
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Depend.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090426/e18d45c4/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Depend.testsuite
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090426/e18d45c4/attachment-0001.pl 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20090426/e18d45c4/attachment.bin 


More information about the antlr-interest mailing list