[antlr-interest] Parse 1 - N repeats
Bart Kiers
bkiers at gmail.com
Mon Feb 8 06:06:25 PST 2010
Hi Adam,
You could handle it in (plain) programming logic inside your grammar.
Here's a little demo:
grammar Test;
@parser::members {
public static void main(String[] args) throws Exception {
String text =
"FIELD1\n"+
"REPEATING_GROUP <fields=2> <min=0, max=20>\n"+
"FIELD2\n"+
"FIELD3\n"+
"FIELD4";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
new TestParser(tokens).parse();
}
class Repeat {
final List<String> fieldList;
final int fields;
final int min;
final int max;
Repeat(int fields, int min, int max) {
this.fieldList = new ArrayList<String>(fields);
this.fields = fields;
this.min = min;
this.max = max;
}
boolean done() {
return fieldList.size() == fields;
}
public String toString() {
return String.format("fields=\%s, min=\%d, max=\%d", fieldList, min,
max);
}
}
}
parse
: ( rp=repeat {System.out.println("repeat :: "+$rp.r);}
| id=Identifier {System.out.println("field :: "+$id.text);}
)*
EOF
;
repeat returns [Repeat r]
: Identifier '<' 'fields' '=' fields=Identifier '>' '<' 'min' '='
min=Identifier ',' 'max' '=' max=Identifier '>'
{$r = new Repeat(Integer.valueOf($fields.text),
Integer.valueOf($min.text), Integer.valueOf($max.text));}
(id=Identifier {$r.fieldList.add($id.text); if($r.done()) return $r;}
)*
;
Identifier
: ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' )+
;
WhiteSpace
: ( ' ' | '\t' | '\r' | '\n' ) {skip();}
;
As you see, whenever the size of the fieldList hits the total, $r is being
returned (and no more id=Identifier will be "eaten").
When you compile and execute the TestParser class, the following is being
printed:
field :: FIELD1
repeat :: fields=[FIELD2, FIELD3], min=0, max=20
field :: FIELD4
Regards,
Bart.
On Mon, Feb 8, 2010 at 1:56 PM, Adam Connelly <
adam.rpconnelly at googlemail.com> wrote:
> Hi,
>
> Sorry if this is answered elsewhere, but I'm not really sure what to search
> for.
>
> I'm trying to parse a language that includes repeating groups. The problem
> is that they don't include terminators, so you can't tell the difference
> between the last item in the group, and the next section. Here's an
> example:
>
> FIELD1
> REPEATING_GROUP <fields=2> <min=0, max=20>
> FIELD2
> FIELD3
> FIELD4
> ...
>
> "fields" specifies the number of fields contained in the group. At the
> moment I've got the following rules, but the problem is that it means that
> the repeating group rule doesn't get its fields associated with it:
>
> recordDefinition
> : RECORD (IDENTIFIER | repeatingGroup)+
> ;
>
> repeatingGroup
> : IDENTIFIER
> '<' NUMBER_OF_FIELDS '=' fieldCount=NUMBER '>'
> '<' NUMBER_OF_REPEATS '=' min=NUMBER ',' max=NUMBER '>'
> ;
>
> Ideally I could do something like:
>
> repeatingGroup
> : IDENTIFIER
> '<' NUMBER_OF_FIELDS '=' fieldCount=NUMBER '>'
> '<' NUMBER_OF_REPEATS '=' min=NUMBER ',' max=NUMBER '>'
> IDENTIFIER{1, $fieldCount}
> ;
>
> But I know you can't do that. What would the best way be to go about
> parsing
> this? Can I build an AST then modify it to put the identifiers for the
> repeating group in the right place.
>
> Cheers,
> Adam
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list