[antlr-interest] Call to interest Object Notation

向秦贤 fyaoxy at gmail.com
Tue Sep 11 20:33:20 PDT 2007


Dear friends,
At first I must sorry for my not good English speaking.
I posted a grammar ON, which stands for Object Notation.
IMHO, current programming languages, there must be a shift from
complicate to simple.
Inspired by JSON, I extract a ON. now It's very simple, and just a notation.
I focus on follow points:
No quote, or quote free
No more symbols, current in ON grammar there are only {} , ; :
Context sensitive. through I know little about compiler and grammar,
but from practice view, we need this kind of language.
And I notice there are some same like stuffs:
GNU libconfuse which I forgot it, a friend hint me that we ever check out it.
Real Simple Data from Dan Yoder
YAML which I forgot it too, Dan Yolder hint me, and yes I checked it out ever.
JSON, of cause, I got a source inspiration from it.
I hope in the community someone maybe do it better.
So I make a Call to all:)

for easy, I paste ON grammar and simple document on here, If someone
intrest, ON project site is https://on.dev.java.net and my jroller log
jroller.com/qinxianscript.
And I'm struggle with antlr tree grammar with a imaginary node.

grammar On;

options{
	output=AST;
	ASTLabelType=CommonTree;
	backtrack = true;
	memoize = true;
}

tokens{
	ARRAY;
	STRING;
	PAIR;
	NAME;
	VALUE;
	OBJECT;
	STR;//imaginary node
}

@header{
package org.peony.object.notation;
}

@lexer::header{
package org.peony.object.notation;
}

/**this is document root object, contains 3 kinds of array object.
*	string, pair, object
*	TODO: add process object-string combination array;
*/
document	
	:	pairArray
	|	stringArray
	|	objectArray
	|
	;

objectArray	:	object (COMMA? object)* COMMA?	->^(ARRAY object+);

object	:	LCURLY! pairArray RCURLY!;

pairArray	:	pair+	->^(OBJECT pair+);

pair	
	:	name=stringArray COLON svalue=stringArray
	(SEMI | LINE | {input.LA(1)==RCURLY ||	input.LA(1)==EOF}?)
		->^(PAIR ^(NAME $name) ^(VALUE $svalue))
	|	name = stringArray COLON? ovalue=objectArray (SEMI | LINE | )
		->^(PAIR ^(NAME $name) ^(VALUE $ovalue))
	;

/**
* TODO: handle for "a,,,,b" pattern
*/
stringArray	:	strings (COMMA LINE strings)* ->^(ARRAY strings+);
		
strings	: string (COMMA string)*	->^(STRING string)+;

/**string consist of words need an imaginary node to contains
whitespaces between WORDs*/
string	
	:	words	->^(STR[$text] )
	|	QSTRING
	;
words	:	WORD+;

COMMENT	:	'/*' .* '*/' {$channel=HIDDEN;};
LINE_COMMENT	:	'//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;};

SEMI	:	';' ;
COMMA	:	',' ;
COLON	:	':' ;
LCURLY	:	'{' ;
RCURLY	:	'}' ;
//TODO: add escape chars process.
/**word contains any char excepted special chars in special usage*/
WORD	:	(~(' ' | '\t' | ',' | ':' | ';' | '{' | '}' | '\r' | '\n' | '"'
| '\''))+;

/**signle or double quoted string contains char excepted quote chars self*/
QSTRING
@after{
	setText(getText().substring(1, getText().length()-1));
}
	:	('"' (~('"'))* '"')
	|	('\'' (~('\''))* '\'');

/**new line token, same as follow whitespace for order priority*/	
LINE	
@init{
	int la=input.LA(-1);
	//System.out.println("before LINE:"+(char)la);
	switch(la){
		case ',':
		case ':':
		case ';':
		case '{':
		case '}':
			skip();
			break;
		default:
	}
}
@after{
	la=input.LA(1);
	//System.out.println("LINE:"+(char)la);
	//TODO: must optimize this ",\n" if in case "a:b,c,\ne,f,\n1:2" there
is a confusion.
	//need predicate nextline.contains(":")
	switch(la){
		case ',':
		case ':':
		case ';':
		case '{':
		case '}':
			skip();
			break;
		default:
	}
}
	: 	('\n' | '\r')+
	;
	
/**whitespace to ignore, place after WORD lexer token, just for order priority*/
WS	:	(' ' | '\t' | '\u000C')+ {$channel=HIDDEN;};



Object Notation (ON) Grammar Guide
Object notation is simple object string represent format. Object
notation just use special symbols as separator (, ; : { }), others are
string literal, and all internal values are string.
Quote symbols are optional. We not have to type this, and save more
and more power of type keyboards in the world.
1: string
Object notation all is string. Other format is tool's responsibility.
Object notation use three sorts of literals represent string.

a)words and/or with whitespace, but cannot contains newline. for example:
a
123
abc
a b c
firstname lastname

b)single quoted string, and can contains newline. for example:
'abc'
'  abc '
'	abc says "hello, I am object notation".'
'this single quoted string contains a
newline'

c)double quoted string, and can contains newline. for example:
"123"
" abc   "
"	abc says 'hello, I'm object notation'."
"this double quoted string contains 2

newline"

2: string array
String array separated by a comma. and newline allowed,and  last comma
not allowed. for example:
abc,123
abc,123,
efg,456  ==>this equals abc,123,efg,456
abc, "first line
second line"

3: name value pair, or key value pair
Pair separated by a colon, end with a semi or newline.
Name component can be string array, and value component can be string
array or object array.
If value is object array,  the colon separator can be ignore, and
ended semi or newline can be ignore too. for example:
name:value;
"name":value
zhang san: woman shot chapnion from ChongQin
body, table: css contents
name{a:b},
name{a:b}

4: object
Object begin with a left curly, followed by multiple name value pairs,
end with a right curly.
The last pair without a semi or newline allowed. and pairs can be
empty. for example:
{name:value;
"name":value
lastname:lastvalue}
{}

5: object array
Object array, same like string array, but separator can be ignored. for example:

{a:b}{c:d}
{a:b},{c:d}
{a:b}{c:d},

6: document object, so called root object
Document object can be:
empty, string array, object array,
multiple pairs(just like a object without left and right curly)

7: whitespace and newline
In most case, whitespaces and newlines ignored,but in name value pair
case which be taken as a separator.



-- 
致敬
向秦贤


More information about the antlr-interest mailing list