Graph deletion takes a long time

Hello!

We stumbled over the issue that it takes a very long time to add and/or remove large graphs.

Context

Stardog 6.1.1 with default options plus

  • versioning.enabled=false
  • preserve.bnode.ids=false

A test with adding and removing data to/from a graph:

  • Create 35 million triples test data using BSBM:

      $ ./generate -s nt -pc 100000
      ...
      34872182 triples generated.
    
  • Add data:

      $ stardog data add -g http://example.org/ stardog dataset.nt
      Adding data from file: dataset.nt
      34,872,182 triples added in 00:05:10.341
    
  • Remove data:

      $ stardog data remove -g http://example.org/ stardog
      34,872,182 triples removed in 00:04:14.675
    

The same behavior applies for the update query DROP SILENT GRAPH <http://example.org/>.

Issue

Our issue is that we need a way to trigger a graph removal which doesn't block the call until the graph is successfully removed. Is there any way to accomplish this? Like a way to trigger a graph removal in the background.

3 Likes

Maybe I am proposing something that is almost too simple:

$ stardog data remove -g http://example.org/ stardog &

Note the “&” added to the end of your command which would send the command/transaction into the background and allow your script to continue otherwise.

Okay, sorry. I forgot to mention that we connect to Stardog over the Java API and we have our own browser-based interface. How can we achieve that the user doesn't have to wait until the remove operation is done. Maybe, can we use the transaction API for that? Like, triggering the removal and later get some feedback if the removal was successful. Do you now understand our problem?

The experts may have better ideas. My first reaction would be: Java threads are your friend.

(And don't share connection objects across threads, especially in this case.)

I’m not sure exactly what you’re looking to do but you can always use named graph security to restrict access to the named graph and then delete it lazily.

It is not about access to the graph to be deleted, but how can we delete a large graph without letting the user wait until success. Yeah, we can implement threading in our application, however, I was wondering if Stardog itself already provides something for this?

I don't believe that Stardog itself provides what you're looking for.

Okay, if that long-running graph deletion process is encapsulated in a separate thread on client side, so that the user does not have to wait for its completion. What about inserting data into this very same graph afterwards. Could this be done instantly or does the user have to wait until the deletion has completed anyways?
Our workflows rely on dropping a graph entirely and re-running the import again. So it would be great to save the time we would need for deleting the graphs (especially if it is several minutes).

You might be able to do what you're looking for by writing a custom HTTP handler extension.

One would assume, if these are two independent transactions, that the Stardog server will know how to serialize and process them in the right order. It should work now and faster in the future. Look here: https://www.stardog.com/docs/#_snapshot_isolation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.