[antlr-interest] Recognition of dynamic ID-definitions
Michael Bedward
michael.bedward at gmail.com
Sun Jan 30 21:52:36 PST 2011
Hello Christian,
I've been waiting to see if anyone else would answer this question
before venturing a response.
I'd first create a pre-processor that runs at parser execution time,
feeding your 'real' parser with source transformed according to a
current list of characters recognized as operators. Below is some a
toy grammar for such a pre-processor where the start rule takes as an
argument a List<String> of current operators.
Given the input "a+b" and a List of operators that includes "+" it
will produce output var<a> op<+> var<b>. If the List excludes "+" the
output will be var<a+b>.
It scores low on efficiency and elegance but might get you started.
Michael
grammar Dynamic;
@header {
package dynamic;
import java.util.ArrayList;
}
@lexer::header {
package dynamic;
}
@members {
List<String> operators;
StringBuilder topSB = new StringBuilder();
void addVar(String var) {
if (var.length() > 0) {
topSB.append("var<").append(var).append("> ");
}
}
void addOp(String op) {
topSB.append("op<").append(op).append("> ");
}
}
// Parser rules
prog[List<String> operators]
@init {
this.operators = operators == null ? new ArrayList<String>() : operators;
}
@after {
System.out.println( topSB.toString() );
}
: statement+
;
statement
@init {
StringBuilder sb = new StringBuilder();
}
@after {
addVar(sb.toString());
}
: (element {
if ($element.isOp) {
addVar(sb.toString());
addOp($element.src);
sb = new StringBuilder();
} else {
sb.append($element.src);
}
})+ DELIM
;
element returns [String src, boolean isOp]
: WORD {$src = $WORD.text; $isOp = false; }
| OP {$src = $OP.text; $isOp = operators.contains($OP.text);}
;
// Lexer rules
WORD : LETTER+
;
// All potential operator chars
OP : ('+' | '-')
;
DELIM : ';'
;
fragment
LETTER : ('a'..'z' | 'A'..'Z')
;
WS : (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
;
On 26 January 2011 09:21, Christian Mrugalla <christian at mrugalla.info> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear all,
>
> is it possible to recognize a language which is (so to say)
> 'parameterized' by a finite set of arbitrary named identifiers, using
> ANTLR?
>
> Trivial Example:
>
> expr : ID ( '+' ID)* ;
>
> ID is not defined at parser-generation-time, it is only known that at
> parser-execution-time there exists a finite set S of arbitrary Strings
> which contains the allowed values for ID.
>
> That is in particular: It depends on S, if "a+b" is:
> - - build by '+'-connected 'a'- and 'b'-IDs
> - - an ID named 'a+b'
> - - invalid, because S contains the IDs "a+" and "b"
>
> I did not found any hint concerning such kind of
> language-parameterization in the "The Definitive ANTLR Reference".
>
> Thank you in advance for your help!
>
> Kind Regards,
> Christian Mrugalla
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAk0/TOUACgkQz2D7mOZ/GFxR0ACbB+GVUODsY1Njr8nGF7M6axJR
> h2sAn3Ae3PAxGayB3lMODRQgAIFIvRm/
> =ehYs
> -----END PGP SIGNATURE-----
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list