Handle multiple R2RML mapping files

Hi there,

I’m exploring Stardog’s capability regarding to ingesting structured data. I was able to create a virtual graph from a relational database using VirtualGraphAdminConnection APIs.

I’m looking to see whether Stardog would accept multiple RML mapping files and could programmatically determine which file to use depending on the datasource. Would somebody give me a pointer to implement it if such capability exists?

Thanks!

Can you add some more details on what exactly you're looking to do? I'm not quite sure what you mean by "determine which file to use depending on the datasource". You can register multiple virtual graphs with Stardog.

Let’s say we have a collection of RML mapping files and multiple data sources. Our users might not know which mapping file goes with which data source. Furthermore, the mapping files are maintained by a different team and their name/existence could be dynamic.

One approach is to maintain a map of data sources and RML files. But instead of that, we are wondering if Stardog API has something to help us handle the above use case. :slight_smile:

I don't think Stardog would support that out of the box. The association between the mappings and database are made when you supply the connection properties and mappings with either the cli or using a VirtualGraphAdminConnection. Stardog supports the R2RML spec and there isn't any mechanism for doing that in the spec but it shouldn't be too difficult to add that in some way. How you go about that is going to depend on how you'd like to maintain the mapping. (the information has to be supplied somehow and somewhere). D2Rq, the precursor to the R2RML spec, did have a mechanism for this. See d2rq:Database in the mapping spec.

As just one possible path you could leverage the d2rq mapping to add that information to your R2RML files. R2RML doesn't have any resource that represent a collection of triplesMaps so you'd have to throw them into a named graph. Then write a bash script and call it say, stardog-mapped-virtual that takes the url of your mapping files. It would download the mapping file and load it into Stardog

stardog-admin db drop somerandomname
stardog-admin db create -n somerandomname <(curl http://mymappings.file)

and then extract the properties file and mappings file and loads

stardog-admin virtual add <(stardog query mymappings -f CSV "select concat("jdbcDriver=", ?jdbcDriver, "\n"...) <(stardog query mymappings "construct where { graph <mygraph> { ?s ?p ?o}}"

I should point out that the above uses the CSV output type to reconstruct the properties file. You might have to run it through "tail -n +1" to strip the header off. It would only have a single line and single column so you wouldn't get any commas. I haven't tried it and obviously the above examples aren't complete but I think that should work. You can modify it to store the mappings somewhere else but hopefully it's a helpful example. Let me know how it works out if you decide to give it a go.

Thanks for the pointer!

Hope that helps. There are a lot of possibilities there. You don’t need to create the properties file from triples, you can just store it as a string or have a url that points to it. The fun thing is if you store the mappings and property files in Stardog you can associate additional data with the files and about the database, then use the reasoner to help you make the association. Something like “Associate this mapping with a database if a manager has approved it for production use and the server is a production server but only if it has been provisioned with X memory. etc”

Be sure to check back in an let everyone know how things worked out and what you came up with.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.