[antlr-interest] guidance on extending the java bytecode example to include strings with spaces...

Morgan Jones mjones at pobox.com
Mon Nov 7 10:12:21 PST 2011


Hi all,I've been digging through the forums and reading both of
TP'sbookstrying to figure out a problem I'm having with my first
2passgrammar...Hopefully someone will be able to point me in
thecorrect direction.I'm working on a grammar to parse Blaise file
into a new XML basedformat.I worked through several of the examples in
TP's definitivebook, upto and including the java byte code example.
It seemed moreor lessstraightforward...I wrote an initial 1 pass
grammar that correctly tokenize the inputandI was able to emit XML
directly from the first pass...But from whatI've read it seems the
best practice is to parse into anintermediateformat (an AST) and then
walk that tree...Unfortunately Blaise is a very English looking
format, here's a brief example:
DATAMODEL IdString "Title String, first example"ENDMODEL
I want to parse the above into something like:
<survey title="Title String, first example" id="IdString"><\survey>
My initial attempt at the 2 pass solution just threw
errors...thisforced me to figure out that debugger, which pointed to
the factthatmy intermediate form was such that the 2nd parser
couldn'tdeterminethe break between tokens...I've been reading a bunch
to try and figure this out, but without alotof headway...I'll include
both of my current grammar's.Any suggestions would be
welcome.Thanks,Morgan

blaise.g


grammar blaise;
options {
output=AST;
}

tokens {
  SURVEY;   // variable definition
  MODELNAME;
  DISPLAYNAME;
}


datamodel
	:	modeldef (fields+)? (rules+)? (NEWLINE+)? endmodel;
	
fields	:	fieldhdr fielddef*;
rules	:	rulehdr;

modeldef:	startmodel modelname displaystring -> ^(SURVEY ^(MODELNAME
modelname) ^(DISPLAYNAME displaystring));
	
displaystring
	:	'\"'! .* '\"'!
	;
	
questionstring
	:	'\"' (LETTERS|NUMBERS)+ '?' '\"'
	;

modelname
	:	(LETTERS|NUMBERS)+;

fielddef:	LETTERS questionstring ':' (fieldstr|fieldnum|fieldselect) NEWLINE;

fieldstr:	'STRING[' NUMBERS+ ']';
fieldnum:	NUMBERS '..' NUMBERS;
fieldselect
	:	'(' selectdef+ ')';
selectdef
	:	LETTERS displaystring ','?;

fieldhdr:	'FIELDS';
rulehdr	:	'RULES';
startmodel: 'DATAMODEL'!;
endmodel:	'ENDMODEL'!;


PUNCT	:	',' | '.';

LETTERS	:	('a'..'z'|'A'..'Z')+;

NUMBERS	:	'0'..'9'+;

NEWLINE :	'\r'? '\n' ;
WS	:	(' '|'\t')+ {$channel = HIDDEN;};


blaiseGen.g


tree grammar blaiseGen;
options {
    tokenVocab=blaise;
    ASTLabelType=CommonTree;
    output=template;
}

datamodel
	:	modeldef (fields+)? (rules+)? (NEWLINE+)? endmodel; // -> beginSurvey();
	
fields	:	fieldhdr fielddef*;
rules	:	rulehdr;

modeldef :  ^(SURVEY ^(MODELNAME modelname) ^(DISPLAYNAME
displaystring)) ->
startSurvey(title={$displaystring.text},id={$modelname.text});
	
displaystring
	:	'\"' .* '\"'
	;
	
questionstring
	:	'\"' (LETTERS|NUMBERS)+ '?' '\"'
	;

modelname
	:	(LETTERS|NUMBERS)+;

fielddef:	LETTERS questionstring ':' (fieldstr|fieldnum|fieldselect) NEWLINE;

fieldstr:	'STRING[' NUMBERS+ ']';
fieldnum:	NUMBERS '..' NUMBERS;
fieldselect
	:	'(' selectdef+ ')';
selectdef
	:	LETTERS displaystring ','?;

fieldhdr:	'FIELDS';
rulehdr	:	'RULES';
startmodel: 'DATAMODEL';
endmodel:	'ENDMODEL';


More information about the antlr-interest mailing list