Packaging udf functions

I've been giving some thought to how custom UDF's are packaged and I wanted to know what people thought about the subject.

So right now I'm packaging them up into fat jars which is easy and convenient but the downsides are the functions don't generally have any cross dependencies and if someone wanted to use one they need to install all of them. Again not a big problem but it's an all or nothing thing. I can image having some functions that you don't want everyone to be able to run. It would be nice to have a security feature like for named graphs for functions which I guess you could do by implementing the function as a service. (I might look into that). It also makes it somewhat difficult to throw out a single quick function which is sometimes nice. What would be ideal would be to package a single function per jar but maintaining it quickly becomes a nightmare and you get a lot of duplicate code since you need to copy the dependencies for each jar/function.

Ok, we have something to manage dependencies. I could just stop using fat jars and use Maven aether to manage the dependencies and maybe a utility to install them stardog-plugin install .... but the stardog scripts would need to be updated to recursively add any jars in STARDOG_EXT instead of just using the wild card.

I've looked into using javascript for functions but unfortunately the Nashorn javascript engine has been deprecated and the suggested replacement is to use GraalVM and I don't know when or if that will ever be an option with Stardog.

I've been looking into some somewhat old work that was titled "Web of Functions" Web of Functions that I think is interesting. I don't like the security or performance implications of executing functions remotely but they're not any different than service queries. Maybe you could proxy to local implementation if they're available and remote ones if they're allowed.

I don't think size would be a big issue unless it's extreme. The other issue is that you're mixing into an existing codebase where the classpath is not isolated. As long as you test against a running instance, things should work fine.

Remote functions could be interesting if it's worth the expense of the remote call for set of arguments. A privacy issue arises in that the RDF values would be sent over the network.

Jess

I don't think the size of an individual plugin is going to be extreme but there will be a large number of them so a modest additional size will add up. I know going down this rabbit hole eventually leads to something like OSGi. I took a look at something called peaberry but I don't think that's something that can be done from the outside so I'll get back to that later.

I am looking into using proguard to get them as small as possible. It's a pain to get setup so I'll add it to the maven archetype that I built for creating Stardog functions.

I've got something that I'm working on right now but as soon as that is done I'm going to look at the remote functions. It looks like being able to invoke dynamic functions is on list for the SPARQL 1.2 working group. I'm trying to think of a good way to blend local and remote functions. Something like if the function isn't available locally it will attempt a remote invocation. It would require a query rewrite service to convert the function call into a service call kind of like what you're doing with full text search so I know it can be done. I just have to figure that out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.