[antlr-interest] Recognition of dynamic ID-definitions

Mon Jan 31 03:07:12 PST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Michael,

I had already some preprocessing in mind, as an emergency solution if
ANTLR is not powerful enough to express such dynamics. Thank you for
your suggestion.

I got two answers directly by E-Mail, both with the solution outlined as:

expr : t=ID {check.isValidRuntimeID(t.getText())}? ( '+' ID )* ;

Now I had the time to check if this elegant solution works. The
remaining problem is how to define ID!

I concretely tried:

grammar simple_example;
@header {import RT.RuntimeIDs;}
@lexer::header{import RT.RuntimeIDs;}
expr : t=ID {RuntimeIDs.isElem(t.getText())}? ('+' ID)*;
ID: (.)*;

This yields to an error message "The following alternatives can never be
matched" pointing to the line "ID: (.)*;".

After replacing this line by "ID: (options {greedy=true;} : .)*;" the
parser could be compiled, but this does not work at runtime (assuming
RuntimeIDs.isElem returns true iff its argument is "a" or "b", and the
input-stream to be parsed is "a+b"):

I got an "rule expr failed predicate"-error.

Using some *usual* ID-definition like
ID: 'a'..'b';
works instead.

Any other ideas, except a handwritten preprocessing, to write
ANTLR-grammars with IDs defined at runtime?

Kind regards,
Christian Mrugalla

Michael Bedward wrote:
> Hello Christian,
> 
> I've been waiting to see if anyone else would answer this question
> before venturing a response.
> 
> I'd first create a pre-processor that runs at parser execution time,
> feeding your 'real' parser with source transformed according to a
> current list of characters recognized as operators.  Below is some a
> toy grammar for such a pre-processor where the start rule takes as an
> argument a List<String> of current operators.
> 
> Given the input "a+b" and a List of operators that includes "+" it
> will produce output var<a> op<+> var<b>.  If the List excludes "+" the
> output will be var<a+b>.
> 
> It scores low on efficiency and elegance but might get you started.
> 
> Michael
> 
> 
> grammar Dynamic;
> 
> @header {
> package dynamic;
> import java.util.ArrayList;
> }
> 
> @lexer::header {
> package dynamic;
> }
> 
> @members {
> List<String> operators;
> 
> StringBuilder topSB = new StringBuilder();
> 
> void addVar(String var) {
>     if (var.length() > 0) {
>         topSB.append("var<").append(var).append("> ");
>     }
> }
> 
> void addOp(String op) {
>     topSB.append("op<").append(op).append("> ");
> }
> 
> }
> 
> // Parser rules
> prog[List<String> operators]
> @init {
>     this.operators = operators == null ? new ArrayList<String>() : operators;
> }
> @after {
>     System.out.println( topSB.toString() );
> }
>             : statement+
>             ;
> 
> statement
> @init {
>     StringBuilder sb = new StringBuilder();
> }
> @after {
>     addVar(sb.toString());
> }
>             : (element {
>                 if ($element.isOp) {
>                     addVar(sb.toString());
>                     addOp($element.src);
>                     sb = new StringBuilder();
>                 } else {
>                     sb.append($element.src);
>                 }
>               })+ DELIM
>             ;
> 
> element returns [String src, boolean isOp]
>             : WORD {$src = $WORD.text; $isOp = false; }
>             | OP {$src = $OP.text; $isOp = operators.contains($OP.text);}
>             ;
> 
> // Lexer rules
> WORD        : LETTER+
>             ;
> 
> // All potential operator chars
> OP          : ('+' | '-')
>             ;
> 
> DELIM       : ';'
>             ;
> 
> fragment
> LETTER      : ('a'..'z' | 'A'..'Z')
>             ;
> 
> WS          :   (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
>             ;
> 
> 
> 
> On 26 January 2011 09:21, Christian Mrugalla <christian at mrugalla.info> wrote:
> Dear all,
> 
> is it possible to recognize a language which is (so to say)
> 'parameterized'  by a finite set of arbitrary named identifiers, using
> ANTLR?
> 
> Trivial Example:
> 
> expr : ID ( '+' ID)* ;
> 
> ID is not defined at parser-generation-time, it is only known that at
> parser-execution-time there exists a finite set S of arbitrary Strings
> which contains the allowed values for ID.
> 
> That is in particular: It depends on S, if "a+b" is:
> - build by '+'-connected 'a'- and 'b'-IDs
> - an ID named 'a+b'
> - invalid, because S contains the IDs "a+" and "b"
> 
> I did not found any hint concerning such kind of
> language-parameterization in the "The Definitive ANTLR Reference".
> 
> Thank you in advance for your help!
> 
> Kind Regards,
> Christian Mrugalla
> 
>>
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAk1Gl+AACgkQz2D7mOZ/GFzUYQCeJWh23D6IAY4x9m9+0LmUUDyN
xvoAoI9cxOddv6OxHiFOx/OWEpKIyiJ1
=GqKl
-----END PGP SIGNATURE-----