We are building an application that makes use of the Stardog graph database to store data and knowledge about multiple domains. The idea is to store sensor data linked to devices owned by our customers. I was thinking to store the data in separate named graphs in order to be able to secure the data in the database. Writing and reading should only be possible to your named graph. What about reaoning? Let’s say we have a rule representing some business logic. If we store the data as mentioned, can multiple customers access the inferred data? Can you store the new triples in a separate (public) graph?
What would your approach be in this case?
“Named graph permissions do not affect the schema axioms used in reasoning and every reasoning query will use the same schema axioms even though some users might not have been granted explicit read access to schema graphs. But non-schema axioms in those named graphs would not be visible to users without authorization.”
What is the difference between ‘schema axioms’ and ‘non-schema axioms’ within this respect?
First of all, Stardog does not materialize inferred triples. They’re only derived during query evaluation. If you want to store inferred triples back in the database, you can do that (by running a SPARQL Update query), and you’re free to chose which named graph they go to.
Now, when Stardog evaluates a query with reasoning, it does so using the schema (aka TBox or ontology) and data (aka ABox). The part of the docs that you quoted says that the named graph security only controls access to the data part, and not the schema. The schema is extracted once and for all users, and is always the same for every query with reasoning. However, when a user runs the query, the schema is only applied to the part of the data (named graphs) that this user is allowed to see.
Thanks for your reply, I still have some questions though.
Let’s take an example : CustomerA has a sensorA sending the humidity value; customer B has a sensor B sending the temperature. Data from customer A and B are stored in separate named graphs. Both sensors are linked to the traffic security domain and a business rule is implemented in the DB evaluating these measurements and inferring the PotentiallyIcy phenomenon. Will both customers A and B and possibly others be able to retrieve whether the road is potentially icy based on others data?
If customer A can only read the graph which stores humidity values and, similarly, customer B can only read the temperature graph, then no. You need both facts to derive that the road can be icy (if I understand correctly).
If customer A can write to only its graph but can also read what B stores about temperature in B’s graph (and vice versa), then both will be able to derive the icy road condition.
I think this might be a little confusing because it can be read in two ways. There are schema axioms and things that are not schema axioms. I could see someone incorrectly reading that as, there axioms some of which are schema ones and some of which are non-schema ones.
The way I think of this is schema-axioms including your rules are a universal truth that apply to the entire database so you can't have one version of the truth for one person and another version for someone else.
Keep in mind that what you're looking to do pokes all kinds of holes in the security model but that might be acceptable depending on the data and what you're looking to infer from it.
You could create two graphs for each user and use one for non sensitive data like temperature and humidity and the other for more sensitive data like gps location. The nice thing about this is you're being explicit about the security level of the data you're trying to protect.