Hi Marcelo,
This is a great post and we're sure it'll be very useful to Stardog users. We'll link to it from our Stardog Labs site. Most of my comments are pretty minor, please find them below.
The first time you mention "RDF Dataset", it's not fully clear that you talk about a dataset associated with a SPARQL query, as per Section 13 in the spec. I know that many are confused about the difference between "RDF dataset" as a collection of RDF quads in the database and "RDF dataset" as a collection of graphs in the query. The next paragraph makes it more clear.
Exactly one default graph: The default graph does not have a name and may not contain any triples.
I suggest rephrasing to "may be empty in the data" to not give the false impression that it's not allowed to contain any triples.
When you talk about FROM
vs FROM NAMED
, it'd be good to explicitly say that BGPs outside of graph g {}
are evaluated against the default part of the RDF Dataset (i.e. defined using FROM
) while BGPs within graph {}
are evaluated for each graph in the named part of the RDF Dataset.
Misunderstanding that point is the number one reason for getting confused about empty query results. You say it in
The GRAPH keyword is used to make the active graph one of all of the named graphs in the dataset for part of the query
but maybe showing a query with FROM
(and no FROM NAMED
) and graph {}
would be a good example of a query which cannot return results.
Counting triples in the default graph (Stardog extension)
This example is correct but looks a little strange to me. You could achieve the same with just
SELECT (count(*) as ?size)
WHERE
{ ?s ?p ?o }
(assuming query.all.graphs=false
) or, if you don't want to rely on that option then
SELECT (count(*) as ?size)
FROM stardog:context:default
WHERE
{ ?s ?p ?o }
In other words, I am not sure what ORDER BY
and GROUP BY
are bringing to the table here since there's only one graph.
The following query is equivalent to the previous one, however it doesn’t rely on the Stardog extension.
True but with a caveat. The query
SELECT ?g (count(*) as ?size)
WHERE {
{
GRAPH ?g {?s ?p ?o}
} UNION {
?s ?p ?o
BIND(“default” AS ?g)
}
}
GROUP BY
?g
ORDER BY
asc(?size)
does not specify its RDF Dataset so to produce the same result as above, it should be evaluated over the RDF Dataset where the default part is the default graph in the data and the named part is all named graphs in the data. This is a reasonable default option (i.e. Stardog will use that with query.all.graphs = false
) but the spec doesn't mandate it.
The query below can be used to achieve the same result. It will search for book1 across all graphs, named and default and union the results.
I'd swap the BIND
and FILTER
in this query to make it clearer to people less familiar with the SPARQL semantics that ?searchString
is assigned before it's used in the filter.
Stardog has a database property called “Query All Graphs” (query.all.graphs), which provides the same behaviour as the stardog:context:all, but set at the database level
I'd be more explicit here. By this time you've defined that RDF Dataset is a structure of two parts (the default part and the named part). So you can say how exactly query.all.graphs affects that. With false
, the default dataset will be <context:default, context:named>
. With true
, the default dataset will be <context:all, context:named>
. And of course that option applies only when the query does not use any FROM
or FROM NAMED
and also the dataset is not set through the SPARQL Protocol (which happens when you select the graph in that drop-down list in Studio).
Again and thanks for your work,
Pavel