[antlr-interest] Help needed with LL(*)-type grammar

Florian von Walter fvwalter at web.de
Tue Feb 9 05:41:01 PST 2010


Hi,

I'm new to working with ANTLR and ANTLRWorks.
I really appreciate what ANTLR and ANTLRWorks offer and purchased "The
Definitive ANTLR Reference" and "Language Implementation Patterns" to
get a better understanding on how to use ANTLR.
I have some background with lexers and parsers and EBNF.

I'd like to write a lexer/parser which is able to recognize a
character-delimited format with nested field groups and transform this
data in XML-style data.

Here are some examples how the data looks like:

Example 1: Obj1^Verb1^field1^field2^
Example 2: Obj1^Verb1^field1^field2^1^Obj2^Verb2^field3^field4^
Example 3:
Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^
Example 4:
Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^1^Obj3^Verb3^field5^

The core grammar behind this looks like this:

object SEP verb SEP (fieldContents SEP)+ (recordCount SEP (object SEP
verb SEP (fieldContents SEP)+)+)*

where SEP is the delimiter ('^' in this case) and recordCount is an
integer which indicates how many (sub)records come after it.

>From my understanding this grammar is of type LL(*) because the
"recordCount" can occur after an arbitrary number of fields due this
part of the rule: (fieldContents SEP)+.

I managed to write a grammar which can parse example 1 but fails for all
other examples:

grammar DLM;
data        :    objectGroup subObjectGroup* ;
objectGroup    :    objectName SEP verbName SEP (fieldData SEP)+;
subObjectGroup    :    recordCount SEP objectGroup+;
objectName    :    'Obj1' | 'Obj2' | 'Obj3' ;
verbName    :    'Verb1' | 'Verb2' | 'Verb3' ;
fieldData    :    NONSEP* ; // field can be empty;
recordCount    :    INT ;
NONSEP        :    ~('^')+ ;
SEP        :    '^';
fragment INT    :    '0'..'9'+;

This grammar just stops when it reaches token "Obj2".

I rewrote rule "data" like this:

data : objectGroup subObjectGroup+ | objectGroup;

This time it failed at token "Obj2" with a NoViableAltException.

I tried to use options {backtrack=true; memoize=true;} for the whole
grammar and rule "data" only but this didn't help.

I also tried to use predicates like this:

subObjectGroup : (INT SEP objectName) => recordCount SEP objectGroup+;

but this didn't help either.

So I'd really appreciate some hints on how to make the other examples parse.

Thanks.

Best regards,
Florian


More information about the antlr-interest mailing list