Stardog/SPARQL Virtual Graph

According to the documentation here, it sounds like you can create a virtual graph over either a graph in a different Stardog database OR a graph in an arbitrary SPARQL endpoint. I ran into two issues when trying to use that feature:

  1. I tried connecting to an arbitrary SPARQL endpoint and kept getting the following error when trying to run the command stardog-admin data-source add on a properties file with properties as described in the documentation linked above: Valid SPARQL query service endpoint required, Example: sparql.url = http://host:port/db/query. I could only get this to work if it literally matched the URL structure in the error message. However, if I gave it a URL of that structure but not for Stardog (e.g. for Fuseki), I got the error Error connecting to SPARQL Virtual Graph source url: <URL>". I only got a successful message when the URL was for Stardog. This seems to imply that it really only supports graphs in other Stardog databases. Is this actually the case, or can it actually be configured to work over an arbitrary SPARQL endpoint?

  2. After creating a Data Source using a Stardog database using the above instructions, I created a Virtual Graph using that Data Source. I did not specify a mapping file as the documentation says mappings are not supported for this type of source (and it wouldn't make sense if one was required), and the Virtual Graph was created successfully. I then tried to query it the same way I have successfully queried other virtual graphs and got the error java.lang.NullPointerException. I get that same error regardless of whether I run that query through Stardog Studio or use the stardog query execute command. Is this a bug or is there something else that has to be done to get this to work?

Matt,

If you want to query a non-Stardog SPARQL endpoint, using Federation is the way to go.

service <https://remote.endpoint.sparql> { ... }

If you want to query another Stardog database on the same server, you can address it directly via the same federated SERVICE mechanism:

service <db://myDbName> { ... }

If you have a remote Stardog, you can use the SERVICE keyword, or add it as a VG as you are attempting.

Cheers,

Mike

Sure, I have been able to use SPARQL federation successfully in the past.

However, even though federation works fine from a functional perspective, conceptually it would be convenient if an arbitrary SPARQL endpoint could be represented as a virtual graph. As one of the benefits of virtualization is to be able to query across data sources without a basic user necessarily needing to know where the data resides and how it is stored, it would make sense that one of those sources would be a SPARQL endpoint in the event that there are other non-Stardog RDF stores in use. That way, basic users wouldn't need to know about other SPARQL endpoints and explicitly have to query them with a SERVICE clause (but still could if they wanted).

The remote SPARQL endpoint is already a graph and definitely abstracts away how the data is stored. I agree the user does have to deal with knowing the endpoint URL/location, but they always have to know something, whether that's an endpoint url or an arbitrary named graph IRI.

Are you looking for it to be a virtual graph from an administrative point of view? Or are you just looking for the syntactic sugar of a 'logical' graph name for the remote endpoint?

A bit of both. If named graph security is in use, then I see from the documentation that Stardog also looks at SERVICE clause URLs, so it doesn't particularly matter in that case. However, it could be more convenient for basic users since Stardog already presents virtual sources to users in a few places, like in Explorer:
here

There wouldn't need to be a separate data catalog of services and endpoints to know what endpoints are out there that users would have to know about and know how to use (although that still could be useful).

Of course, if there was a virtual graph over a SPARQL endpoint, that would still limit the SPARQL endpoint to a single named graph (unless there is a named graph for the union of all graphs). So I suppose having a registry of SPARQL endpoints in addition could also work instead, although that is another concept that basic users would have to learn how to use and/or could complicate some of the features that Stardog already does so well where virtual graphs are concerned.

Matt, I think some of our forthcoming Data Catalog work will help here, at least in terms of helping users find out what's out there, what data is in each place, and how to query.

From there, aliases might be a great way for you to organize a set of graphs that contain relevant data into a single logical graph. This can include both virtual and materialized graphs. That would let uses select just one thing, the alias, and be able to query all of the graphs without having to enumerate them. Kits will this ability to simplify distribution of data models and datasets. These aliases also give you the central graph that you can secure and manage, simplifying ops.