The
SPARQL query language for RDF is designed to meet the use cases and
requirements identified by the RDF Data Access Working Group in RDF Data Access
Use Cases and Requirements; detailed explanations can be found on the official
W3C's SPARQL's specifications.
Purpose of this project is provide a cross-compiler
ANTRL v3's grammar which is an implementation of the SPARQL grammar's specifications.
SPARQL
grammar is available for download here and from the
project home sparkle-g
download list.
Take a look also at passed test and dawg cases!
- 2007-10-30: First release
(Michele Mostarda, Simone Tripodi)
- 2007-11-19: Juergen Pfundt joined the team
- 2008-01-21: Bugfix release
- Created project space on
Google Code (Simone Tripodi)
- Created html-doc's project
(Simone Tripodi)
- Fixed antlr3 warnings emitted
when compiling the Sparql.g file (Juergen Pfundt)
- Added lexical rule
"ANY" as fallback, in case no lexical rule matches (Juergen Pfundt)
- Recognition of keywords
(including "true" and "false") is case insensitive
now (Juergen Pfundt)
- Implemented lexical rule
for comments (Juergen Pfundt)
- Solved problem with graphterm and blankNode
which implicitely referenced WS via lexical
rules NIL and ANON. As WS sends its content to
channel hidden this resulted in problems with WS before and after braces.
(Juergen Pfundt)
- Added a few test cases to
the test suite referenced in W3C's SPARQL's specifications.
Modified one test case to test "comments". Test cases still
have to be enhanced! (Juergen Pfundt)
- Replaced parser literals
by lexer tokens as preparation for AST
generation (Juergen Pfundt)
- Added sparqlT.g
as starting point for a tree grammar (Juergen
Pfundt)
- Adapted the build file to
compile both grammars and to generate javadoc (Juergen Pfundt)
- Modified build system in a
smart way using Apache Ivy, svn Ant and ANTLR3
Ant Task (Simone Tripodi)
- Replaced parser literal
IRI_REF by lexer tokens as preparation for AST
generation (Simone Tripodi)
- The A token is case
insensitive now as requested by the w3.org rdf-sparql-query
document (Juergen Pfundt)
- Added DAWG Testcases (Juergen Pfundt)
- 2011-06-08: Third release
- Update of grammar to
current definition of Sparql 1.1 at SPARQL 1.1
Query Language, which includes the SPARQL 1.1
Update language. This refers to the version of 12th of May 2011.
- Codepoint escape sequences are not accounted in this
grammar. As mentioned in the W3C Working Draft 14 October 2010 in chapter
18.2 "Codepoint Escape Sequences":
"...Codepoint escape sequences can appear
anywhere in the query string. They are processed before parsing based on
the grammar rules and so may be replaced by codepoints
with significance in the grammar, such as ":" marking a
prefixed name...."
Added finite state automaton for replacing Unicode escape sequences '\\uxxxx' with character values. The implementation ANTLRUnicodePreprocessorFileStream extends ANTLRFileStream. Incomplete Unicode escape sequences are written back unchanged in the file stream. Optimizations to avoid write operations lead to MODIFIED_DATA_STATE and data_buffer_modified. The rationale for deviations from pure doctrine where to keep the number of status small. In the usual case of no Unicode escape sequences in the data stream the maxime is to do (almost) nothing, just loop in the START_STATE. - AST generation added see file
Sparql.q
- Tree grammar added, see file
SparqlT.g
Some remarks regarding various rules and test cases:
1) The Sparql.g lexer rule for PN_LOCAL had to be refined, as it did not cope with trailing '.'. Syntactic predicates and/or greedy=false did not solve the problem of recognizing properly the input e.g.
A.b. as two tokens A.b (PN_LOCAL) followed by a DOT
PN_LOCAL : (PN_CHARS_U|DIGIT) ((PN_CHARS|DOT)* PN_CHARS)?;
Example:
ASK {person:John_Q_Public rdf:type foaf:Person.}
A simple action solved the problem. It looks ahead the DOT and tests if the next input character does not belong to the follow set. If this check proves to be true, the token PN_LOCAL is declared to be complete (return).
2) I18n test cases have been successfully evaluated with AntlrWorks. GUnit seems to have a problem with these test cases and evalutes them as failed.
3) A preprocessor for Unicode escape sequences has been successfully tested with the Unicode test cases. GUnit will evaluate these test cases as failed.
Our
workflow works using the provided Google Code's Issue Tracking, you can find it
following this link.
Did you find a bug? Please report it!!!
Follow
the Google-Code page
Comments? Suggestions? Any idea?
Don't hesitate, write to us! This small team was born with the Open Source philosophy, so we
need also your collaboration!
About
the
SPARQL grammar is licensed under the ANTLR3
License [the BSD License]
The
folks, in alphabetical order:
- Juergen Pfundt (juergen.pfundt)
Juergen is
working for @Deutsche Telekom.
He worked on the AST generation, added the tree grammar and the Unicode pre-processor.
- Michele Mostarda
(michele.mostarda)
Michele
is working as Software Engineer at @Sourcesense S.r.l..
He worked on the first grammar's implementation.
- Simone Tripodi
(simone.tripodi)
Simone
is working as Software Engineer at @Fondazione Bruno Kessler.
He worked on the first grammar's implementation and is the project's
maintainer.