DELETE Query Performance

Hey STARDOG community,
I am having issues with a DELETE query which causes high load and timeouts in my stardog instance.

I want to delete specific triples in my graph if they are existing. The tree I am specifying for the delete looks like this:

Instance of the class A are connected to instances of the class B.
Instances of the classes C, D, E, F, G and H are connected to the instances of class B.
The classes C, D, E, F, G and H could also have connected classes (example class X). The instances of directly connected classes like class X should also be deleted if existent.

My SPARQL query looks like this:

PREFIX ex: <http://example.org/>

DELETE {
    ?A ?p1 ?o1 .
    ?B ?p2 ?o2 .
    ?C ?p3 ?o3. 
    ?D ?p4 ?o4.
    ?E ?p5 ?o5.
    ?F ?p6 ?o6.
    ?G ?p7 ?o7.
    ?H ?p8 ?o8.
}       
WHERE {
    VALUES ?A { 
        {{#each data_array}}
            :A_{{this.instancesOfA}} 
        {{/each}}  
    }
    ?A ?p1 ?o1 .
    OPTIONAL {
        ?A ex:AisConnectedToB ?B.
        OPTIONAL {?B ?p2 ?o2 .}
        OPTIONAL {
            ?B ex:BisConnectedToC ?C.
            ?C ?p3 ?o3.
        }
        OPTIONAL {
            ?B ex:BisConnectedToD ?D.
            ?D ?p4 ?o4.
        }
        OPTIONAL {
            ?B ex:BisConnectedToE ?E.
            ?E ?p5 ?o5.
        }
        OPTIONAL {
            ?B ex:BisConnectedToF ?F.
            ?F ?p6 ?o6.
        }
        OPTIONAL {
            ?B ex:BisConnectedToG ?G.
            ?G ?p7 ?o7.
        }
        OPTIONAL {
            ?B ex:BisConnectedToH ?H.
            ?H ?p8 ?o8.
        }
    }
}

I am using a template which loads ca. 4100 different instances of the class A and inserts them with the VALUES clause.
Can I somehow optimize the performance of the query?
I thought of this:

  1. Instead of starting with the class A as the root, I could start with the class B as the root as class B has all the other classes are connected to B
  2. Instead using one query, I could use several separate DELETE queries like `DELETE WHERE {
    ?a a ex:A ;
    ?p1 ?o1 .
    }

DELETE WHERE {
?b a ex:B ;
?p2 ?o2 .
}

DELETE WHERE {
?c a ex:C ;
?p3 ?o3 .
}`

Guidance will be very helpful,
cheers!

Hi. This looks like a pretty common issue, let me try to explain what (likely) happens on a simpler example.

Let's say you have one node in the graph, call it :x, and you want to delete all its edges (triples with :x as the subject). Not even any triples reachable from :x, just the data of :x. Let's also say that the data is as follows:

:x :p :a, :b, :c .
:x :q :d, :e, :f .

so 3 :p-values and 3 :q-values.

You could write something like:

DELETE { ?s :p ?o1 ; :q ?o2 } 
WHERE {
  bind(:x as ?s)  # this is like your VALUES but for a single node
  ?s :p ?o1 ; :q ?o2
}

this is similar to what you're doing, but consider what happens when you execute it. The WHERE part would produce 3 x 3 = 9 rows, one row for each combination of :p and :q values. Then the DELETE template will turn each row into 2 triples, that's already 18 triples to delete from storage (many will be duplicates). But the storage only has 6 triples, so the query is already doing 3 times more work than needed.

And that's a toy example. When you try to do something like this on a more complex and larger data, the output of just the WHERE part will blow up combinatorially. If you turn your DELETE into a CONSTRUCT, you may see it (don't use Studio, it may kill your browser).

You need to avoid the combinatorics to make this work. In my toy example it could be as simple as

DELETE { ?s :p ?o } 
WHERE {
  bind(:x as ?s)  
  ?s ?p ?o
}

but in a more general case

DELETE { ?s :p ?o1 ; :q ?o2 } 
WHERE {
  bind(:x as ?s)  
  { ?s :p ?o1 } UNION {?s  :q ?o2}
}

would also work.

In your complex case you need to avoid the cross product over connected objects, like if you're deleting an instance of :B that has 10 linked :C instances and 10 linked :D instances, you don't want the query to go through all 100 :C,:D pairs. You can build multiple update queries and chain them into a single SPARQL update sequence, if you care to run your update in a single transaction (that is, atomically).

So, to sum up, I suggest to build your query incrementally and test along the way with CONSTRUCT. If it starts blowing up, you'll notice.

Cheers,
Pavel

Thank you @pavel . I will soon test it and provide feedback

Hey @pavel,
I understand how UNION avoids the cross product.
I have adjusted my WHERE clause and tested it with two instances of A:

WHERE {
    VALUES ?A { 
        {
            :A_instance1
            :A_instance2 
        } 
    
    }
    ?A ?p1 ?o1 .
    OPTIONAL {
        ?A ex:AisConnectedToB ?B.
        
        {OPTIONAL {?B ?p2 ?o2 .}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToC ?C.
            ?C ?p3 ?o3.}}
        
        UNION
       
        {OPTIONAL {
            ?B ex:BisConnectedToD ?D.
            ?D ?p4 ?o4.}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToE ?E.
            ?E ?p5 ?o5.}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToF ?F.
            ?F ?p6 ?o6.}}

        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToG ?G.
            ?G ?p7 ?o7.}}
        
        UNION
        
       {OPTIONAL {
            ?B ex:BisConnectedToH ?H.
            ?H ?p8 ?o8.}}
        
    }
}

I need to check if it actually deletes the same triples like the query above, but first I wanted to ask you if I am now on the right path?
I see already that the adjusted query returns 4.232 results vs. 4.598.984 results from the original query without the UNIONs.

Yes, you need to test correctness. You should be careful with OPTIONALs and UNIONs, they don't always behave like people expect them to (because of the bottom-up evaluation). For example,

A { OPTIONAL {B} } UNION { OPTIONAL {B} }

is not the same as

{ A OPTIONAL { B } } UNION { A OPTIONAL { C } }

Cheers,
Pavel