[antlr-interest] Recognition of dynamic ID-definitions

Sun Jan 30 21:52:36 PST 2011

Hello Christian,

I've been waiting to see if anyone else would answer this question
before venturing a response.

I'd first create a pre-processor that runs at parser execution time,
feeding your 'real' parser with source transformed according to a
current list of characters recognized as operators.  Below is some a
toy grammar for such a pre-processor where the start rule takes as an
argument a List<String> of current operators.

Given the input "a+b" and a List of operators that includes "+" it
will produce output var<a> op<+> var<b>.  If the List excludes "+" the
output will be var<a+b>.

It scores low on efficiency and elegance but might get you started.

Michael

grammar Dynamic;

@header {
package dynamic;
import java.util.ArrayList;
}

@lexer::header {
package dynamic;
}

@members {
List<String> operators;

StringBuilder topSB = new StringBuilder();

void addVar(String var) {
    if (var.length() > 0) {
        topSB.append("var<").append(var).append("> ");
    }
}

void addOp(String op) {
    topSB.append("op<").append(op).append("> ");
}

}

// Parser rules
prog[List<String> operators]
@init {
    this.operators = operators == null ? new ArrayList<String>() : operators;
}
@after {
    System.out.println( topSB.toString() );
}
            : statement+
            ;

statement
@init {
    StringBuilder sb = new StringBuilder();
}
@after {
    addVar(sb.toString());
}
            : (element {
                if ($element.isOp) {
                    addVar(sb.toString());
                    addOp($element.src);
                    sb = new StringBuilder();
                } else {
                    sb.append($element.src);
                }
              })+ DELIM
            ;

element returns [String src, boolean isOp]
            : WORD {$src = $WORD.text; $isOp = false; }
            | OP {$src = $OP.text; $isOp = operators.contains($OP.text);}
            ;

// Lexer rules
WORD        : LETTER+
            ;

// All potential operator chars
OP          : ('+' | '-')
            ;

DELIM       : ';'
            ;

fragment
LETTER      : ('a'..'z' | 'A'..'Z')
            ;

WS          :   (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
            ;

On 26 January 2011 09:21, Christian Mrugalla <christian at mrugalla.info> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear all,
>
> is it possible to recognize a language which is (so to say)
> 'parameterized'  by a finite set of arbitrary named identifiers, using
> ANTLR?
>
> Trivial Example:
>
> expr : ID ( '+' ID)* ;
>
> ID is not defined at parser-generation-time, it is only known that at
> parser-execution-time there exists a finite set S of arbitrary Strings
> which contains the allowed values for ID.
>
> That is in particular: It depends on S, if "a+b" is:
> - - build by '+'-connected 'a'- and 'b'-IDs
> - - an ID named 'a+b'
> - - invalid, because S contains the IDs "a+" and "b"
>
> I did not found any hint concerning such kind of
> language-parameterization in the "The Definitive ANTLR Reference".
>
> Thank you in advance for your help!
>
> Kind Regards,
> Christian Mrugalla
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAk0/TOUACgkQz2D7mOZ/GFxR0ACbB+GVUODsY1Njr8nGF7M6axJR
> h2sAn3Ae3PAxGayB3lMODRQgAIFIvRm/
> =ehYs
> -----END PGP SIGNATURE-----
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>