[antlr-interest] newbie needs help
John B. Brodie
jbb at acm.org
Thu Jan 21 13:25:24 PST 2010
Greetings!
On Thu, 2010-01-21 at 20:20 +0100, Hugo wrote:
> I started using antlr to parse a specific file format.
> The problem is that i don't know how to write correctly my grammar.
>
> The file have the following format.
> It contains multiple lines and each can have the following format:
>
> Only one or multilple hexadecimal caracter with space or not
> ex: A0 A4 B5 77
> or: A0
>
> Only variable identifier with the format VAR_XXX
> ex: VAR_MY_VARIABLE
>
> Or the combinaison of the two previous format
> ex:
> A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2
> or
> VAR_MY_VARIABLE AA BB
> or
> AA BB VAR_MY_VARIABLE
>
>
> what i want to do is to build a AST tree
attached please find a grammar file that is *almost* what I think you
are trying to do.
It does not have a MULTIPLE_BYTES_DEF node because the grouping of a
collection of single_byte instances into a multibyte is ambiguous.
Consider
11 22 33 44 55 66 77 88
is this 8 single bytes? 1 single byte and 7-long multi? is it 4 multi
pairs? a triple, a single and a quad?
i kinda expect you want it to be a single 8-long multi, e.g. any run of
single bytes becomes a multi. But that is a semantic of your language
and getting a parser to do semantics isn't always possible....
if you really need the MULTIPLE_BYTE_DEF node, you might be best served
by parsing using some like my code (e.g. the parser produces only
BYTE_DEF nodes) and then write a tree-walker that transforms the AST
resultant from the parse into a new AST that contains the requisite
MULTIPLE_BYTE_DEF nodes. e.g. scan for and collapse sequences of
consecutive EXPR_DEF nodes that have BYTE_DEF children into a single
EXPR_DEF node containing a single MULTIPLE_BYTE_DEF child.
>
> And the problem is that i don't know how to do this with antlr. the tool
> always tell me that multiple rule can be applies with my grammar.
>
> please help me to solve my problem.
>
> Here is my grammar:
>
> stmts : bytes+ ;
>
>
> bytes : multiple_byte bytes? -> ^(EXPR_DEF multiple_byte bytes? )
>
> | define_expression bytes? -> ^(EXPR_DEF define_expression bytes? )
>
> | NEWLINE ;
>
> define_expression : define_var -> ^(DEFINE_VAR_DEF define_var) ;
>
> define_var : DEFINE_VARIABLE ;
> multiple_byte : single_byte (single_byte)+ -> ^(MULTIPLE_BYTES_DEF
> single_byte single_byte+) ;
>
>
> single_byte : byte_digit -> ^(BYTES_DEF byte_digit) ;
>
> byte_digit : BYTE_DIGIT ;
>
> DEFINE_VARIABLE :
> 'VAR_'('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>
> BYTE_DIGIT :('0'..'9'| 'A'..'F'|'a'..'f')('0'..'9'| 'A'..'F'|'a'..'f') ;
>
> // Ignore whitespace, tab and escape sequence WS : (' '|'\t'|'\\\r\n')+
> {$channel = HIDDEN;} ;
>
> // a new line NEWLINE : '\r'? '\n' ;
>
> thanks a lot
hope this helps...
-jbb
-------------- next part --------------
grammar Test;
options {
output = AST;
ASTLabelType = CommonTree;
}
tokens {
EXPR_DEF;
DEFINE_VAR_DEF;
BYTES_DEF;
}
@members {
private static final String [] x = new String[]{
"A0\n",
"A0 A4 B5 77\n",
"VAR_MY_VARIABLE\n",
"A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2\n",
"VAR_MY_VARIABLE AA BB\n",
"AA BB VAR_MY_VARIABLE\n"
};
public static void main(String [] args) {
for( int i = 0; i < x.length; ++i ) {
try {
System.out.println("about to parse:`"+x[i]+"`");
TestLexer lexer = new TestLexer(new ANTLRStringStream(x[i]));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.stmts_return p_result = parser.stmts();
CommonTree ast = p_result.tree;
if( ast == null ) {
System.out.println("resultant tree: is NULL");
} else {
System.out.println("resultant tree: " + ast.toStringTree());
}
System.out.println();
} catch(Exception e) {
e.printStackTrace();
}
}
}
}
stmts : bytes+ EOF!;
bytes
: ( b=BYTE_DIGIT t=bytes -> ^(EXPR_DEF ^(BYTES_DEF $b) $t) )
| ( d=DEFINE_VARIABLE t=bytes -> ^(EXPR_DEF ^(DEFINE_VAR_DEF $d) $t) )
| NEWLINE ;
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
fragment DIGIT : '0'.. '9' ;
DEFINE_VARIABLE : 'VAR_' (LETTER|'_') (LETTER | DIGIT | '_')*;
fragment HEXIT : '0'..'9' | 'A'..'F' | 'a'..'f' ;
BYTE_DIGIT : HEXIT HEXIT ;
// Ignore whitespace, tab and escape sequence
WS : (' '|'\t'|'\\\r\n')+ {$channel = HIDDEN;} ;
// a new line
NEWLINE : '\r'? '\n' ;
More information about the antlr-interest
mailing list