[antlr-interest] Recognition of dynamic ID-definitions
Christian Mrugalla
christian at mrugalla.info
Mon Jan 31 03:53:31 PST 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Correcting a typo in my recent post:
Replace 'ID (.)*;' by:
ID: (.)+;
Kind regards,
Christian Mrugalla
Christian Mrugalla wrote:
> Hello Michael,
>
> I had already some preprocessing in mind, as an emergency solution if
> ANTLR is not powerful enough to express such dynamics. Thank you for
> your suggestion.
>
> I got two answers directly by E-Mail, both with the solution outlined as:
>
> expr : t=ID {check.isValidRuntimeID(t.getText())}? ( '+' ID )* ;
>
> Now I had the time to check if this elegant solution works. The
> remaining problem is how to define ID!
>
> I concretely tried:
>
> grammar simple_example;
> @header {import RT.RuntimeIDs;}
> @lexer::header{import RT.RuntimeIDs;}
> expr : t=ID {RuntimeIDs.isElem(t.getText())}? ('+' ID)*;
> ID: (.)*;
>
> This yields to an error message "The following alternatives can never be
> matched" pointing to the line "ID: (.)*;".
>
> After replacing this line by "ID: (options {greedy=true;} : .)*;" the
> parser could be compiled, but this does not work at runtime (assuming
> RuntimeIDs.isElem returns true iff its argument is "a" or "b", and the
> input-stream to be parsed is "a+b"):
>
> I got an "rule expr failed predicate"-error.
>
> Using some *usual* ID-definition like
> ID: 'a'..'b';
> works instead.
>
> Any other ideas, except a handwritten preprocessing, to write
> ANTLR-grammars with IDs defined at runtime?
>
> Kind regards,
> Christian Mrugalla
>
>
> Michael Bedward wrote:
>> Hello Christian,
>
>> I've been waiting to see if anyone else would answer this question
>> before venturing a response.
>
>> I'd first create a pre-processor that runs at parser execution time,
>> feeding your 'real' parser with source transformed according to a
>> current list of characters recognized as operators. Below is some a
>> toy grammar for such a pre-processor where the start rule takes as an
>> argument a List<String> of current operators.
>
>> Given the input "a+b" and a List of operators that includes "+" it
>> will produce output var<a> op<+> var<b>. If the List excludes "+" the
>> output will be var<a+b>.
>
>> It scores low on efficiency and elegance but might get you started.
>
>> Michael
>
>
>> grammar Dynamic;
>
>> @header {
>> package dynamic;
>> import java.util.ArrayList;
>> }
>
>> @lexer::header {
>> package dynamic;
>> }
>
>> @members {
>> List<String> operators;
>
>> StringBuilder topSB = new StringBuilder();
>
>> void addVar(String var) {
>> if (var.length() > 0) {
>> topSB.append("var<").append(var).append("> ");
>> }
>> }
>
>> void addOp(String op) {
>> topSB.append("op<").append(op).append("> ");
>> }
>
>> }
>
>> // Parser rules
>> prog[List<String> operators]
>> @init {
>> this.operators = operators == null ? new ArrayList<String>() : operators;
>> }
>> @after {
>> System.out.println( topSB.toString() );
>> }
>> : statement+
>> ;
>
>> statement
>> @init {
>> StringBuilder sb = new StringBuilder();
>> }
>> @after {
>> addVar(sb.toString());
>> }
>> : (element {
>> if ($element.isOp) {
>> addVar(sb.toString());
>> addOp($element.src);
>> sb = new StringBuilder();
>> } else {
>> sb.append($element.src);
>> }
>> })+ DELIM
>> ;
>
>> element returns [String src, boolean isOp]
>> : WORD {$src = $WORD.text; $isOp = false; }
>> | OP {$src = $OP.text; $isOp = operators.contains($OP.text);}
>> ;
>
>> // Lexer rules
>> WORD : LETTER+
>> ;
>
>> // All potential operator chars
>> OP : ('+' | '-')
>> ;
>
>> DELIM : ';'
>> ;
>
>> fragment
>> LETTER : ('a'..'z' | 'A'..'Z')
>> ;
>
>> WS : (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
>> ;
>
>
>
>> On 26 January 2011 09:21, Christian Mrugalla <christian at mrugalla.info> wrote:
>> Dear all,
>
>> is it possible to recognize a language which is (so to say)
>> 'parameterized' by a finite set of arbitrary named identifiers, using
>> ANTLR?
>
>> Trivial Example:
>
>> expr : ID ( '+' ID)* ;
>
>> ID is not defined at parser-generation-time, it is only known that at
>> parser-execution-time there exists a finite set S of arbitrary Strings
>> which contains the allowed values for ID.
>
>> That is in particular: It depends on S, if "a+b" is:
>> - build by '+'-connected 'a'- and 'b'-IDs
>> - an ID named 'a+b'
>> - invalid, because S contains the IDs "a+" and "b"
>
>> I did not found any hint concerning such kind of
>> language-parameterization in the "The Definitive ANTLR Reference".
>
>> Thank you in advance for your help!
>
>> Kind Regards,
>> Christian Mrugalla
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAk1GorsACgkQz2D7mOZ/GFzMXgCeOSrg5J8q9cfr+SXyrNPei/pk
iXwAoMRNC0w3WBKRLePSDDRgTBdSAm6e
=W7Ow
-----END PGP SIGNATURE-----
More information about the antlr-interest
mailing list