Some Tracker + SPARQL bits
20 February 2012
Those are some notes about Tracker with links and information which has been useful to me when using Tracker as data storage backend in a project. Maybe those will be useful to someone else out there :)
Reference documentation
The basics are the W3C standards:
- SPARQL query syntax
- SPARQL update syntax
- The examples section in the Tracker wiki is easier to digest than the W3C documents, so it is usually better to search there for something similar to what one is trying to achieve.
- If one knows the basic SPARQL syntax, reading the ontologies is the way to know what Tracker has stored and under which names.
Tracker-specific info
- Tracker adds some extra features which can be used in SPARQL queries. Some of them can be used to make queries faster (see below).
- It is possible to subscribe to notifications of changes to the Tracker database. However, using the low-level D-Bus support is crude, therefore it is better to use something like TrackerLiveQuery.
Optimizing queries
At some point one may realize that a particular query is not running fast enough, and usually the first thought would be to blame Tracker being slow: as a matter of fact Tracker tends to be quite fast, but sometimes the way it translates the SPARQL queries to SQL makes them be slower than they could be. And the good news is that most of the queries can be tweaked to facilitate the job of the query optimizer. There are some hints in the Tracker wiki:
Also, I would recommend reading those two nice articles by Adrien Bustany:
Undocumented behavior
Undocumented behavior exists in Tracker to a certain degree. Some of the undocumented behavior is caused by the fact that SQLite is used underneath and that the SPARQL parser included in Tracker is quite permissive and will just pass-through certain constructs when generating the SQL queries.
As an example, the regular expression syntax used by SPARQL does not include predefined character classes, but as SQLite uses POSIX regular expressions internally, the following filter expression works:
SELECT ?name
WHERE { ?urn nco:imNickname ?name
FILTER (bound(?name) && !REGEX(?name, "[[:space:]]+")) }
(Obtains the nick names of instant messaging contacts which do not have spaces in them.)
Another example of undocumented behaviour is the fact that aggregate
functions (e.g. COUNT
) can be used on the result of a property
function. For example, take this query:
SELECT nie:url(?urn) COUNT(?regions)
WHERE { ?urn rdf:type nmm:Photo .
OPTIONAL { ?urn nfo:hasRegionOfInterest ?region } }
GROUP BY ?urn
(Obtains the URLs of images and the number of associated regions of interest.)
The above query would run faster if it would be possible to get rid of
the OPTIONAL
clause, by using a property function, but when an
attribute has multiple values, the property function returns the
concatenation of the values separated by commas, so COUNT
would be
expected choke when applied to that… No! Actually, the following works
as expected:
SELECT nie:url(?urn) COUNT(nfo:hasRegionOfInterest(?urn))
WHERE { ?urn rdf:type nmm:Photo }
GROUP BY ?urn
(Obtains the URLs of images and the number of associated regions of interest — faster version.)