DELETE Query Performance

DomiS · February 18, 2025, 2:47pm

Hey STARDOG community,
I am having issues with a DELETE query which causes high load and timeouts in my stardog instance.

I want to delete specific triples in my graph if they are existing. The tree I am specifying for the delete looks like this:

Instance of the class A are connected to instances of the class B.
Instances of the classes C, D, E, F, G and H are connected to the instances of class B.
The classes C, D, E, F, G and H could also have connected classes (example class X). The instances of directly connected classes like class X should also be deleted if existent.

My SPARQL query looks like this:

PREFIX ex: <http://example.org/>

DELETE {
    ?A ?p1 ?o1 .
    ?B ?p2 ?o2 .
    ?C ?p3 ?o3. 
    ?D ?p4 ?o4.
    ?E ?p5 ?o5.
    ?F ?p6 ?o6.
    ?G ?p7 ?o7.
    ?H ?p8 ?o8.
}       
WHERE {
    VALUES ?A { 
        {{#each data_array}}
            :A_{{this.instancesOfA}} 
        {{/each}}  
    }
    ?A ?p1 ?o1 .
    OPTIONAL {
        ?A ex:AisConnectedToB ?B.
        OPTIONAL {?B ?p2 ?o2 .}
        OPTIONAL {
            ?B ex:BisConnectedToC ?C.
            ?C ?p3 ?o3.
        }
        OPTIONAL {
            ?B ex:BisConnectedToD ?D.
            ?D ?p4 ?o4.
        }
        OPTIONAL {
            ?B ex:BisConnectedToE ?E.
            ?E ?p5 ?o5.
        }
        OPTIONAL {
            ?B ex:BisConnectedToF ?F.
            ?F ?p6 ?o6.
        }
        OPTIONAL {
            ?B ex:BisConnectedToG ?G.
            ?G ?p7 ?o7.
        }
        OPTIONAL {
            ?B ex:BisConnectedToH ?H.
            ?H ?p8 ?o8.
        }
    }
}

I am using a template which loads ca. 4100 different instances of the class A and inserts them with the VALUES clause.
Can I somehow optimize the performance of the query?
I thought of this:

Instead of starting with the class A as the root, I could start with the class B as the root as class B has all the other classes are connected to B
Instead using one query, I could use several separate DELETE queries like `DELETE WHERE {
?a a ex:A ;
?p1 ?o1 .
}

DELETE WHERE {
?b a ex:B ;
?p2 ?o2 .
}

DELETE WHERE {
?c a ex:C ;
?p3 ?o3 .
}`

Guidance will be very helpful,
cheers!

pavel · February 18, 2025, 9:18pm

Hi. This looks like a pretty common issue, let me try to explain what (likely) happens on a simpler example.

Let's say you have one node in the graph, call it :x, and you want to delete all its edges (triples with :x as the subject). Not even any triples reachable from :x, just the data of :x. Let's also say that the data is as follows:

:x :p :a, :b, :c .
:x :q :d, :e, :f .

so 3 :p-values and 3 :q-values.

You could write something like:

DELETE { ?s :p ?o1 ; :q ?o2 } 
WHERE {
  bind(:x as ?s)  # this is like your VALUES but for a single node
  ?s :p ?o1 ; :q ?o2
}

this is similar to what you're doing, but consider what happens when you execute it. The WHERE part would produce 3 x 3 = 9 rows, one row for each combination of :p and :q values. Then the DELETE template will turn each row into 2 triples, that's already 18 triples to delete from storage (many will be duplicates). But the storage only has 6 triples, so the query is already doing 3 times more work than needed.

And that's a toy example. When you try to do something like this on a more complex and larger data, the output of just the WHERE part will blow up combinatorially. If you turn your DELETE into a CONSTRUCT, you may see it (don't use Studio, it may kill your browser).

You need to avoid the combinatorics to make this work. In my toy example it could be as simple as

DELETE { ?s :p ?o } 
WHERE {
  bind(:x as ?s)  
  ?s ?p ?o
}

but in a more general case

DELETE { ?s :p ?o1 ; :q ?o2 } 
WHERE {
  bind(:x as ?s)  
  { ?s :p ?o1 } UNION {?s  :q ?o2}
}

would also work.

In your complex case you need to avoid the cross product over connected objects, like if you're deleting an instance of :B that has 10 linked :C instances and 10 linked :D instances, you don't want the query to go through all 100 :C,:D pairs. You can build multiple update queries and chain them into a single SPARQL update sequence, if you care to run your update in a single transaction (that is, atomically).

So, to sum up, I suggest to build your query incrementally and test along the way with CONSTRUCT. If it starts blowing up, you'll notice.

Cheers,
Pavel

DomiS · February 26, 2025, 5:05pm

Thank you @pavel . I will soon test it and provide feedback

DomiS · March 6, 2025, 2:00pm

Hey @pavel,
I understand how UNION avoids the cross product.
I have adjusted my WHERE clause and tested it with two instances of A:

WHERE {
    VALUES ?A { 
        {
            :A_instance1
            :A_instance2 
        } 
    
    }
    ?A ?p1 ?o1 .
    OPTIONAL {
        ?A ex:AisConnectedToB ?B.
        
        {OPTIONAL {?B ?p2 ?o2 .}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToC ?C.
            ?C ?p3 ?o3.}}
        
        UNION
       
        {OPTIONAL {
            ?B ex:BisConnectedToD ?D.
            ?D ?p4 ?o4.}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToE ?E.
            ?E ?p5 ?o5.}}
        
        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToF ?F.
            ?F ?p6 ?o6.}}

        UNION
        
        {OPTIONAL {
            ?B ex:BisConnectedToG ?G.
            ?G ?p7 ?o7.}}
        
        UNION
        
       {OPTIONAL {
            ?B ex:BisConnectedToH ?H.
            ?H ?p8 ?o8.}}
        
    }
}

I need to check if it actually deletes the same triples like the query above, but first I wanted to ask you if I am now on the right path?
I see already that the adjusted query returns 4.232 results vs. 4.598.984 results from the original query without the UNIONs.

pavel · March 7, 2025, 2:16pm

Yes, you need to test correctness. You should be careful with OPTIONALs and UNIONs, they don't always behave like people expect them to (because of the bottom-up evaluation). For example,

A { OPTIONAL {B} } UNION { OPTIONAL {B} }

is not the same as

{ A OPTIONAL { B } } UNION { A OPTIONAL { C } }

Cheers,
Pavel

Topic		Replies	Views
DELETE triples with specific predicate not working Support	11	2359	March 28, 2019
Delete data with blank nodes - Stardog cluster Support	6	1445	March 21, 2017
SPARQL query interrupts with memory errors Support	9	543	April 21, 2021
Unexpected Error while deleting triples Support	3	337	March 8, 2020
Query performance issues Support	8	1254	February 28, 2017

DELETE Query Performance

Related topics