the SPARQL query language Grammar for ANTLR v3

Main

Overview

The SPARQL query language for RDF is designed to meet the use cases and requirements identified by the RDF Data Access Working Group in RDF Data Access Use Cases and Requirements; detailed explanations can be found on the official W3C's SPARQL's specifications.
Purpose of this project is provide a cross-compiler ANTRL v3's grammar which is an implementation of the SPARQL grammar's specifications.

Download

SPARQL grammar is available for download here and from the project home sparkle-g download list.
Take a look also at passed test and dawg cases!

Release Notes

2007-10-30: First release (Michele Mostarda, Simone Tripodi)
2007-11-19: Juergen Pfundt joined the team
2008-01-21: Bugfix release

Created project space on Google Code (Simone Tripodi)
Created html-doc's project (Simone Tripodi)
Fixed antlr3 warnings emitted when compiling the Sparql.g file (Juergen Pfundt)
Added lexical rule "ANY" as fallback, in case no lexical rule matches (Juergen Pfundt)
Recognition of keywords (including "true" and "false") is case insensitive now (Juergen Pfundt)
Implemented lexical rule for comments (Juergen Pfundt)
Solved problem with graphterm and blankNode which implicitely referenced WS via lexical rules NIL and ANON. As WS sends its content to channel hidden this resulted in problems with WS before and after braces. (Juergen Pfundt)
Added a few test cases to the test suite referenced in W3C's SPARQL's specifications. Modified one test case to test "comments". Test cases still have to be enhanced! (Juergen Pfundt)
Replaced parser literals by lexer tokens as preparation for AST generation (Juergen Pfundt)
Added sparqlT.g as starting point for a tree grammar (Juergen Pfundt)
Adapted the build file to compile both grammars and to generate javadoc (Juergen Pfundt)
Modified build system in a smart way using Apache Ivy, svn Ant and ANTLR3 Ant Task (Simone Tripodi)
Replaced parser literal IRI_REF by lexer tokens as preparation for AST generation (Simone Tripodi)
The A token is case insensitive now as requested by the w3.org rdf-sparql-query document (Juergen Pfundt)
Added DAWG Testcases (Juergen Pfundt)

2011-06-08: Third release

Update of grammar to current definition of Sparql 1.1 at SPARQL 1.1 Query Language, which includes the SPARQL 1.1 Update language. This refers to the version of 12th of May 2011.
Codepoint escape sequences are not accounted in this grammar. As mentioned in the W3C Working Draft 14 October 2010 in chapter 18.2 "Codepoint Escape Sequences": "...Codepoint escape sequences can appear anywhere in the query string. They are processed before parsing based on the grammar rules and so may be replaced by codepoints with significance in the grammar, such as ":" marking a prefixed name...."
Added finite state automaton for replacing Unicode escape sequences '\\uxxxx' with character values. The implementation ANTLRUnicodePreprocessorFileStream extends ANTLRFileStream. Incomplete Unicode escape sequences are written back unchanged in the file stream. Optimizations to avoid write operations lead to MODIFIED_DATA_STATE and data_buffer_modified. The rationale for deviations from pure doctrine where to keep the number of status small. In the usual case of no Unicode escape sequences in the data stream the maxime is to do (almost) nothing, just loop in the START_STATE.
AST generation added see file Sparql.q
Tree grammar added, see file SparqlT.g
Some remarks regarding various rules and test cases:
1) The Sparql.g lexer rule for PN_LOCAL had to be refined, as it did not cope with trailing '.'. Syntactic predicates and/or greedy=false did not solve the problem of recognizing properly the input e.g.
       A.b.       as two tokens A.b (PN_LOCAL) followed by a DOT

     PN_LOCAL : (PN_CHARS_U|DIGIT) ((PN_CHARS|DOT)* PN_CHARS)?;

Example:
       ASK {person:John_Q_Public rdf:type foaf:Person.}

A simple action solved the problem. It looks ahead the DOT and tests if the next input character does not belong to the follow set. If this check proves to be true, the token PN_LOCAL is declared to be complete (return).
2) I18n test cases have been successfully evaluated with AntlrWorks. GUnit seems to have a problem with these test cases and evalutes them as failed.
3) A preprocessor for Unicode escape sequences has been successfully tested with the Unicode test cases. GUnit will evaluate these test cases as failed.

Bugs

Our workflow works using the provided Google Code's Issue Tracking, you can find it following this link.
Did you find a bug? Please report it!!!

Source Repository

Follow the Google-Code page

Get Involved

Comments? Suggestions? Any idea? Don't hesitate, write to us! This small team was born with the Open Source philosophy, so we need also your collaboration!

About

License

the SPARQL grammar is licensed under the ANTLR3 License [the BSD License]

Who we are

The folks, in alphabetical order:

Juergen Pfundt (juergen.pfundt)

Juergen is working for @Deutsche Telekom.
He worked on the AST generation, added the tree grammar and the Unicode pre-processor.

Michele Mostarda (michele.mostarda)

Michele is working as Software Engineer at @Sourcesense S.r.l..
He worked on the first grammar's implementation.

Simone Tripodi (simone.tripodi)

Simone is working as Software Engineer at @Fondazione Bruno Kessler.
He worked on the first grammar's implementation and is the project's maintainer.