After a brief hiatus, I'm back and I brought gifts. I'd like to officially announce the initial release of Stardog WebFunctions.
Stardog WebFunctions are Stardog + IPFS + WASM =
Stardog WebFunctions offers a simplified process for using and distributing custom functions.
Background and Motivation
Stardog has always made it very easy to extend with custom functionality including user defined functions. I've written quite a few but there always some shortcomings that I was dissatisfied with and that I felt were standing in the way of them being used more often.
The first problem is distribution. To distribute a new function required building a new jar containing the function, posting it somewhere it could be downloaded from, and letting people know it was available. Then the user needed to download it and place it in the extensions folder and restart Stardog. Nothing too complicated but it was a lot of steps and the user might not even have permission to restart Stardog, not to mention restarting a database just to make a function available. These multiple steps also make it difficult to get new functions, updates and bug fixes out to people quickly.
The second issue is security. You need to place a lot of trust in the person providing the functions because of the access granted to a random jar file thrown into your database. This becomes worse as the number of different people that provide various functions grows and makes it difficult to support a large and diverse group of people providing functions.
The third problem is availability. Custom functions are great but not portable. As soon as you use a custom function in your query you are requiring that anyone else who wants to run your queries has also installed the jar file for the function you're using. This creates a high barrier for the use of custom functions. How do you know it's installed? How do you know what you need to install? The utility has to outweigh the lack of portability. This lack of portability makes the use of custom functions nearly impossible in service queries, difficult to use in regular queries, and forces the user into using the basic set of functions included in the SPARQL standard.
How WebFunctions addresses these problems
The Stardog WebFunction plugin embeds a highly secure WebAssembly sandbox for execution of custom functions. The user can download and install this one function with will securely execute any WebFunction on demand. Allowing the user to take advantage of new functions immediately as they are made available and upgrade to newer functions with enhances and bug fixes right away. Security is further enhanced by support for distributing WebFunctions over the Interplanetary File System (IPFS). IPFS is a content based peer to peer file system that guarantees that the file you run is exactly the file you requested and is not tied to where the file comes from. Anyone running WebFunctions should feel safe running functions from any number of people.
That addresses the first and second problem, security and distribution, but what about availability? WebFunctions significantly lowers the bar for writing portable queries that use custom functions. As long as you have the one WebFunction plugin installed you can run any number of custom functions that are downloaded and executed on demand. There are a number of ways to execute WebFunctions depending on how independent you would like them to be allowing you to choose between ease and portability.
A user can execute a WebFunction by referencing an HTTP URL as the first argument of the WebFunction plugin wf:call()
such as wf:call("http://wf.semantalytics.com/mywebfunction")
. This would download and execute the WebAssembly function. The function is cached in memory to avoid network delays. There are additional WebFunction calls for warming the cache such as wf:cacheLoad()
, reloading wf:cacheReload
, and clearing the cache wf:cacheClear
and viewing the cache contents wf:cacheList
. Upgrading to the latest function versions should be as easy as refreshing the cache. Sometimes for portability you don't want functions to automatically upgrade so there are methods of referencing functions that are immutable and guaranteed to run the same code very time they are called and another method for referring to them that delivers the latest version of a given function, your choice.
The last problem that WebFunctions addresses is availability. How do you know the functions you want to run will be available? The previous example is good for one time use but if you restart your database you'll need to download the functions again so you're relying on the web server at http://wf.semantalytics.com being available and that I continue to maintain the domain name. To solve that problem WebFunctions can be downloaded over IPFS. IPFS is a peer to peer, content based system. Content is referred to by hash and is downloaded over the network wherever that content is available. This contributes to security by guaranteeing that the code that is run exactly matches the hash of the code that was requested. By default WebFunctions use a default IPFS gateway located at http://ipfs.io/ but that can be changed by setting the environment variables STARDOG_IPFS_GATEWAY
and STARDOG_IPNS_GATEWAY
. That's a little better but you still need to be able to reach a gateway.
For WebFunctions to be even more resilient you can easily install a local IPFS node. By default IPFS enables a local gateway that you can point to by updating the previously mentioned environment variables. With this setup, when functions are requested, not only are they cached in memory but they are also cached in the local IPFS node on disk and will survive reboots. You can now take your databases offline and still have access to all of the functions you've been using without making any updates to your queries.
With IPFS you can reliably run your WebFunctions without any external dependencies. Functions can be run on any Stardog database that has the one WebFunction plugin installed and can access the function executable either over HTTP or any node on the IPFS network. If you have a heavy weight function that you'd like to offload to another machine this allows you do reliably call the web function in a
SERVICE query to offload the computation. Now you can use WebFunctions knowing that they'll be available in the future without relying on external infrastructure. In addition you are guaranteed to be running the same code for future queries where repeatability is a concern. A function isn't just an opaque URL it identifies the exact binary being run independent of location.
Future directions
Some future directions that I see WebFunctions going is building WebFunctions into databases that are used with VirtualGraphs allowing you to use the same functions in your mapping and ETL that you use in Stardog allowing even greater portability.
Execute high performance machine learning models using the upcoming W3C standard WASI-NN. I've already successfully executing a machine learning model using ONNX but WASI-NN will allow access to a greater variety of models.
I'm interested in seeing how far you can go with wiring together WebFunctions using SPARQL. Many systems already employ complex wiring and computational graphs Spring, Guava, Tensorflow. What if you defined your programs as a computational graph in RDF that execute dynamically via SPARQL query?
Use IPFS to deliver data and computation. Now you have a distribution network that distributes both the code to run and the data to run it on that is independent of the location where it's run. The ultimate serverless lambda function.
I've only completed a couple of demo functions so I'd be happy to prioritize any functions people find helpful or other databases you'd like to see integrated. If you have the plugin installed they'd be available the second they're published. I need to put in a little more infrastructure on my side for writing and publishing the like a nice web page and some online documentation, etc.
Let me know what you think. Feedback is always appreciated.