Some Tracker + SPARQL bits

20 February 2012

Those are some notes about Tracker with links and information which has been useful to me when using Tracker as data storage backend in a project. Maybe those will be useful to someone else out there :)

Reference documentation

The basics are the W3C standards:

SPARQL query syntax
SPARQL update syntax
The examples section in the Tracker wiki is easier to digest than the W3C documents, so it is usually better to search there for something similar to what one is trying to achieve.
If one knows the basic SPARQL syntax, reading the ontologies is the way to know what Tracker has stored and under which names.

Tracker-specific info

Tracker adds some extra features which can be used in SPARQL queries. Some of them can be used to make queries faster (see below).
It is possible to subscribe to notifications of changes to the Tracker database. However, using the low-level D-Bus support is crude, therefore it is better to use something like TrackerLiveQuery.

Optimizing queries

At some point one may realize that a particular query is not running fast enough, and usually the first thought would be to blame Tracker being slow: as a matter of fact Tracker tends to be quite fast, but sometimes the way it translates the SPARQL queries to SQL makes them be slower than they could be. And the good news is that most of the queries can be tweaked to facilitate the job of the query optimizer. There are some hints in the Tracker wiki:

SPARQL Tips & Tricks

Also, I would recommend reading those two nice articles by Adrien Bustany:

Undocumented behavior

Undocumented behavior exists in Tracker to a certain degree. Some of the undocumented behavior is caused by the fact that SQLite is used underneath and that the SPARQL parser included in Tracker is quite permissive and will just pass-through certain constructs when generating the SQL queries.

As an example, the regular expression syntax used by SPARQL does not include predefined character classes, but as SQLite uses POSIX regular expressions internally, the following filter expression works:

SELECT ?name
WHERE { ?urn nco:imNickname ?name
        FILTER (bound(?name) && !REGEX(?name, "[[:space:]]+")) }

(Obtains the nick names of instant messaging contacts which do not have spaces in them.)

Another example of undocumented behaviour is the fact that aggregate functions (e.g. COUNT) can be used on the result of a property function. For example, take this query:

SELECT nie:url(?urn) COUNT(?regions)
WHERE { ?urn rdf:type nmm:Photo .
        OPTIONAL { ?urn nfo:hasRegionOfInterest ?region } }
GROUP BY ?urn

(Obtains the URLs of images and the number of associated regions of interest.)

The above query would run faster if it would be possible to get rid of the OPTIONAL clause, by using a property function, but when an attribute has multiple values, the property function returns the concatenation of the values separated by commas, so COUNT would be expected choke when applied to that… No! Actually, the following works as expected:

SELECT nie:url(?urn) COUNT(nfo:hasRegionOfInterest(?urn))
WHERE { ?urn rdf:type nmm:Photo }
GROUP BY ?urn

(Obtains the URLs of images and the number of associated regions of interest — faster version.)