[antlr-interest] Recognition of dynamic ID-definitions

Christian Mrugalla christian at mrugalla.info
Mon Jan 31 03:53:31 PST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Correcting a typo in my recent post:

Replace 'ID (.)*;' by:

ID: (.)+;

Kind regards,
Christian Mrugalla

Christian Mrugalla wrote:
> Hello Michael,
> 
> I had already some preprocessing in mind, as an emergency solution if
> ANTLR is not powerful enough to express such dynamics. Thank you for
> your suggestion.
> 
> I got two answers directly by E-Mail, both with the solution outlined as:
> 
> expr : t=ID {check.isValidRuntimeID(t.getText())}? ( '+' ID )* ;
> 
> Now I had the time to check if this elegant solution works. The
> remaining problem is how to define ID!
> 
> I concretely tried:
> 
> grammar simple_example;
> @header {import RT.RuntimeIDs;}
> @lexer::header{import RT.RuntimeIDs;}
> expr : t=ID {RuntimeIDs.isElem(t.getText())}? ('+' ID)*;
> ID: (.)*;
> 
> This yields to an error message "The following alternatives can never be
> matched" pointing to the line "ID: (.)*;".
> 
> After replacing this line by "ID: (options {greedy=true;} : .)*;" the
> parser could be compiled, but this does not work at runtime (assuming
> RuntimeIDs.isElem returns true iff its argument is "a" or "b", and the
> input-stream to be parsed is "a+b"):
> 
> I got an "rule expr failed predicate"-error.
> 
> Using some *usual* ID-definition like
> ID: 'a'..'b';
> works instead.
> 
> Any other ideas, except a handwritten preprocessing, to write
> ANTLR-grammars with IDs defined at runtime?
> 
> Kind regards,
> Christian Mrugalla
> 
> 
> Michael Bedward wrote:
>> Hello Christian,
> 
>> I've been waiting to see if anyone else would answer this question
>> before venturing a response.
> 
>> I'd first create a pre-processor that runs at parser execution time,
>> feeding your 'real' parser with source transformed according to a
>> current list of characters recognized as operators.  Below is some a
>> toy grammar for such a pre-processor where the start rule takes as an
>> argument a List<String> of current operators.
> 
>> Given the input "a+b" and a List of operators that includes "+" it
>> will produce output var<a> op<+> var<b>.  If the List excludes "+" the
>> output will be var<a+b>.
> 
>> It scores low on efficiency and elegance but might get you started.
> 
>> Michael
> 
> 
>> grammar Dynamic;
> 
>> @header {
>> package dynamic;
>> import java.util.ArrayList;
>> }
> 
>> @lexer::header {
>> package dynamic;
>> }
> 
>> @members {
>> List<String> operators;
> 
>> StringBuilder topSB = new StringBuilder();
> 
>> void addVar(String var) {
>>     if (var.length() > 0) {
>>         topSB.append("var<").append(var).append("> ");
>>     }
>> }
> 
>> void addOp(String op) {
>>     topSB.append("op<").append(op).append("> ");
>> }
> 
>> }
> 
>> // Parser rules
>> prog[List<String> operators]
>> @init {
>>     this.operators = operators == null ? new ArrayList<String>() : operators;
>> }
>> @after {
>>     System.out.println( topSB.toString() );
>> }
>>             : statement+
>>             ;
> 
>> statement
>> @init {
>>     StringBuilder sb = new StringBuilder();
>> }
>> @after {
>>     addVar(sb.toString());
>> }
>>             : (element {
>>                 if ($element.isOp) {
>>                     addVar(sb.toString());
>>                     addOp($element.src);
>>                     sb = new StringBuilder();
>>                 } else {
>>                     sb.append($element.src);
>>                 }
>>               })+ DELIM
>>             ;
> 
>> element returns [String src, boolean isOp]
>>             : WORD {$src = $WORD.text; $isOp = false; }
>>             | OP {$src = $OP.text; $isOp = operators.contains($OP.text);}
>>             ;
> 
>> // Lexer rules
>> WORD        : LETTER+
>>             ;
> 
>> // All potential operator chars
>> OP          : ('+' | '-')
>>             ;
> 
>> DELIM       : ';'
>>             ;
> 
>> fragment
>> LETTER      : ('a'..'z' | 'A'..'Z')
>>             ;
> 
>> WS          :   (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
>>             ;
> 
> 
> 
>> On 26 January 2011 09:21, Christian Mrugalla <christian at mrugalla.info> wrote:
>> Dear all,
> 
>> is it possible to recognize a language which is (so to say)
>> 'parameterized'  by a finite set of arbitrary named identifiers, using
>> ANTLR?
> 
>> Trivial Example:
> 
>> expr : ID ( '+' ID)* ;
> 
>> ID is not defined at parser-generation-time, it is only known that at
>> parser-execution-time there exists a finite set S of arbitrary Strings
>> which contains the allowed values for ID.
> 
>> That is in particular: It depends on S, if "a+b" is:
>> - build by '+'-connected 'a'- and 'b'-IDs
>> - an ID named 'a+b'
>> - invalid, because S contains the IDs "a+" and "b"
> 
>> I did not found any hint concerning such kind of
>> language-parameterization in the "The Definitive ANTLR Reference".
> 
>> Thank you in advance for your help!
> 
>> Kind Regards,
>> Christian Mrugalla
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAk1GorsACgkQz2D7mOZ/GFzMXgCeOSrg5J8q9cfr+SXyrNPei/pk
iXwAoMRNC0w3WBKRLePSDDRgTBdSAm6e
=W7Ow
-----END PGP SIGNATURE-----


More information about the antlr-interest mailing list