[antlr-interest] Help needed with LL(*)-type grammar
Florian von Walter
fvwalter at web.de
Tue Feb 9 05:41:01 PST 2010
Hi,
I'm new to working with ANTLR and ANTLRWorks.
I really appreciate what ANTLR and ANTLRWorks offer and purchased "The
Definitive ANTLR Reference" and "Language Implementation Patterns" to
get a better understanding on how to use ANTLR.
I have some background with lexers and parsers and EBNF.
I'd like to write a lexer/parser which is able to recognize a
character-delimited format with nested field groups and transform this
data in XML-style data.
Here are some examples how the data looks like:
Example 1: Obj1^Verb1^field1^field2^
Example 2: Obj1^Verb1^field1^field2^1^Obj2^Verb2^field3^field4^
Example 3:
Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^
Example 4:
Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^1^Obj3^Verb3^field5^
The core grammar behind this looks like this:
object SEP verb SEP (fieldContents SEP)+ (recordCount SEP (object SEP
verb SEP (fieldContents SEP)+)+)*
where SEP is the delimiter ('^' in this case) and recordCount is an
integer which indicates how many (sub)records come after it.
>From my understanding this grammar is of type LL(*) because the
"recordCount" can occur after an arbitrary number of fields due this
part of the rule: (fieldContents SEP)+.
I managed to write a grammar which can parse example 1 but fails for all
other examples:
grammar DLM;
data : objectGroup subObjectGroup* ;
objectGroup : objectName SEP verbName SEP (fieldData SEP)+;
subObjectGroup : recordCount SEP objectGroup+;
objectName : 'Obj1' | 'Obj2' | 'Obj3' ;
verbName : 'Verb1' | 'Verb2' | 'Verb3' ;
fieldData : NONSEP* ; // field can be empty;
recordCount : INT ;
NONSEP : ~('^')+ ;
SEP : '^';
fragment INT : '0'..'9'+;
This grammar just stops when it reaches token "Obj2".
I rewrote rule "data" like this:
data : objectGroup subObjectGroup+ | objectGroup;
This time it failed at token "Obj2" with a NoViableAltException.
I tried to use options {backtrack=true; memoize=true;} for the whole
grammar and rule "data" only but this didn't help.
I also tried to use predicates like this:
subObjectGroup : (INT SEP objectName) => recordCount SEP objectGroup+;
but this didn't help either.
So I'd really appreciate some hints on how to make the other examples parse.
Thanks.
Best regards,
Florian
More information about the antlr-interest
mailing list