Format preserving encryption custom function in Stardog

Format preserving encryption FPE is an important mechanism for so many industry sectors.

FPE has been implemented in many lanaguages (many times):

Can we add a custom function to Stardog?

Thank you,
Radu

Possibly. I’d have to look closer at the library you provided but it can probably be done as a function. If not it can be done with some more complex mechanism.

I’ve had it on my list to implement crypto functions similar to the ones provided by MySQL https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html

Zach,

Very well to hear that. Glad to see Stardog on top of crypto matters! :+1:

BTW FPE is a standard recommendation by NIST:

Regards,
Radu

Ok, I've got a basic encrypt/decrypt function put together. It's a little rough. I need to put together some parameter checks but it looks like it successfully makes a encrypt=>decrypt round trip. I've been thinking about how you should pass the key around. Right now it's just a parameter to the function but I'm not sure if that's the best way to handle it. I was also thinking it might be interesting if you could register a predicate and have it transparently encrypt the literal.

I'll wrap it up shortly and get you something you can at least start to play around with. Thanks for the link. This is actually the first time I've heard of FPE. I've been wanting to mess around with Attribute Based Encryption (ABE) in Stardog for a while. It might be an interesting way to encrypt BITES data stored in S3 but allow encrypted access control outside of Stardog. That, and ABE is super crypto magic cool.

Zach,

Thanks for your response and effort to implement such an important feature. Looking forward to using it.

Meanwhile to solve my problem I have added a subsring "enc-" to all the objects I had to encrypt. So the next query returns them all:

select * where {
    ?s ?p ?o
    FILTER (regex(str(?o), '^enc-.*', 'i'))
}

Just trying to write a sparql update that would decrypt them on demand. Not sure how to that. How would you go about it?

Thanks,
Radu

The repo is at

I've implemented two functions

http://semantalytics.com/2017/09/ns/stardog/kibble/crypto/formatPreservingEncrypt
http://semantalytics.com/2017/09/ns/stardog/kibble/crypto/formatPreservingDecrypt

They both take three string literal arguments. The first is a base64 encoded key, the second is the plain/cypher text, and the third is the tweak. It uses the default domain from the java library you referenced. There isn't much checking so you need to make sure you use the correct key size or it will just silently fail.

I haven't done an official release yet since there are a few things that I need to fix up before that but you can get a snapshot build at https://drive.google.com/file/d/1tAVdwVvBXUfok7WNO3F02A6g0DvY8D2U/view?usp=sharing Place the jar into the STARDOG_EXT directory and restart and you should be able to play around with it.

1 Like

Zach,

Thanks for such a quick turn around - looking forward to leverage the new functions.

Meanwhile can you please comment on my previous question?

Regards,
Radu

So you added an enc-* prefix to the literals that you need to encrypt so it's the enc- plus the plain text or you added enc- prefix to the literals that you've already encrypted? It might be a little cleaner if you just used a datatype to identify which literals should be or are encrypted.

Can you explain what you need from a sparql update command that would decrypt them on demand? I'm not sure what you mean by "on demand". Do you mean an update script that with find all prefixed literals and replace them with their encrypted equivalents? That seems doable but might take a little bit to put together the exact query. I would wonder how useful it is to encrypt it if it was already out there in plain text. The regex filter query is also likely to be slow and it would be much faster to use full text search.

You might be interested in what I suggested earlier about writing a transaction listener. You could have a datatype like http://example.com/encryptme and whenever you add anything with that datatype it transparently encrypts it.

so if you add...

:user :hasSocialSecurityNumber "123-45-6789"^^<http://example.com/encryptme>

and then queried for it

select ?ssn where { :user :hasSocialSecurityNumber ?ssn }

you'd get

:user :hasSocialSecurityNumber "776-13-1333"^^<http://example.com/encrypted>

or something like that. I'm not quite sure how you'd handle keys and all that but I'm just throwing ideas out there. I know it's somewhat abusing data types and prevents you from supplying your own datatype. Maybe you could encode the datatype into the literal like

:user :hasSocialSecurityNumber """"123-45-6789"^^<http://example.com/ssn>"""^^<http://example.com/encryptme>

and then decrypt to

:user :hasSocialSecurityNumber "776-13-1333"^^<http://example.com/ssn>

Someone would have to give me their opinion if that's something ok to do.

Zach,

I like your dedicated datatype approach:

:user :hasSocialSecurityNumber "123-45-6789"^^<http://example.com/encryptme>

It is cleaner than using "enc-" in front of a string value - but like you said it prevents me from supplying my own datatype. Which is exactly my dilema now.

Then I also need the encrypted data to be temporarily decrypted say for next 10 seconds then go back to encrypted state. I know this gets a bit complicated.. So I am trying to come up a with a sparql update(s) that can do the above.

Thanks,
Radu

Let me mention two things we are working on that are related to the conversation here:

  1. Encryption-at-rest: This will allow users to configure their databases so that contents stored in disk will always be encrypted. This is no FPE and when you query your database the results are always returned in unencrypted format so from query-perspective you cannot even tell data is encrypted.
  2. Property-level Access Control: Allow users to designate properties, such as the hasSocialSecurityNumber in this example, as sensitive properties so the triples using that property will either be completely hidden from users or the values will be returned in masked form, e.g. XXX-XX-1234, based on the permissions granted to the user running the query. The masking function will be configurable and although we have not considered FPE so far the functions Zach provided might be used for this purpose. Homomorphic encryption functions would be another possibility so computations can be performed on encrypted values without decryption. We have not finalized the design yet so I cannot say for sure what will be supported in the first release but the important point is that it will be the property of the triple (not the datatype, not the literal value) that will determine the behavior.

We expect the former feature to be available early in the summer and the latter feature to be available towards the end of the summer.

I think the combination of these features would satisfy the use case here except for values being "temporarily decrypted" for a short amount of time. I do not fully understand if that is supposed to control what is stored in the database, what is returned by queries or something else. Do you have more details about what is the expectation here?

Best,
Evren