From 30ms to 7000ms when asking for edgeAttributeProperty?

I am testing some basic queries (on an instance with ~10m triples) around edge attributes and noticed a very large difference between including a wildcard ?edgeAttributeProperty or not:

A) <50 ms

SELECT ?p ?o ?edgeAttributeObject {
<< :angola ?p ?o >> ?edgeAttributeProperty ?edgeAttributeObject
}

B) > 7000 ms

SELECT ?p ?o ?edgeAttributeProperty ?edgeAttributeObject {
<< :angola ?p ?o >> ?edgeAttributeProperty ?edgeAttributeObject
}

Can I speed this up somehow? (I.e. shouldn't the engine somehow be able to limit the scope of search to the triple and its related edge attributes? It seems to search unnecessarily wide?)

Here's the sample data for the query:

INSERT DATA {
<< :angola :independence_year 1961 >> :source :wikipedia, :dbpedia; :relatedEvent "Angolan War of Independence"
}

Thanks for any hints!

Update: If I include the complete triple statement, the same query is very fast again: (<50ms)

SELECT ?p ?o ?edgeProperty ?edgePropertyObject {
<< :angola :independence_year 1961 >> ?edgeProperty ?edgePropertyObject
}

But still curious why the the wildcard becomes so expensive :thinking:

Sidenote: Is there a way to reference a statement by an id/IRI? I.e. can I replace the triple part for an identifier?

SELECT ?p ?o ?edgeProperty ?edgePropertyObject {
**<< ID/IRI >>** ?edgeProperty ?edgePropertyObject
}

Hi Daniel,

Can you include the text format query plans from Stardog Studio or the output from the stardog query explain command?

Jess

Thank you @jess,

Seems one scan differs between PSO and SPO? Could that account for the difference?

Slice(offset=0, limit=1000) [#1.0K]
`─ Projection(?p, ?o, ?edgeAttributeProperty, ?edgeAttributeObject) [#24.6M]
   `─ NestedLoopJoin(_) [#24.6M]
      +─ Bind(GetStatement(<https://graph.datastory.org/angola>, ?p, ?o, ?lqpfgaow) AS ?wxkfavur) [#1]
      │  `─ Scan[SPOC](<https://graph.datastory.org/angola>, ?p, ?o) [#1]
      `─ Scan[PSO](?wxkfavur, ?edgeAttributeProperty, ?edgeAttributeObject) [#24.6M]
Slice(offset=0, limit=1000) [#1.0K]
`─ Projection(?p, ?o, ?edgeAttributeObject) [#24.6M]
   `─ NestedLoopJoin(_) [#24.6M]
      +─ Bind(GetStatement(<https://graph.datastory.org/angola>, ?p, ?o, ?qyqnptfq) AS ?fqxnuhfl) [#1]
      │  `─ Scan[SPOC](<https://graph.datastory.org/angola>, ?p, ?o) [#1]
      `─ Scan[SPO](?fqxnuhfl, ?edgeAttributeProperty, ?edgeAttributeObject) [#24.6M]

Hi Daniel,

Yes, PSO vs SPO plays a role but the root cause is the join algorithm (nested loops). It can be very inefficient depending on the order in which operands produce inputs. It shouldn't be used in this case but there's a known issue PLAT-2502 which creates loop joins when the join variable is assigned in certain BIND expressions (in this case, GetStatement).

For now you'd need to use a longer form for this query instead of the <<>> shortcut:

SELECT ?p ?o ?edgeAttributeProperty ?edgeAttributeObject {
    :angola ?p ?o .
    bind(coalesce(getStatement(:angola, ?p, ?o), "") as ?statement)
    ?statement ?edgeAttributeProperty ?edgeAttributeObject .
}

Here I'm working around the problem by using an artificial coalesce call to tell the query engine that the ?statement variable cannot have a null value so it can use faster join algorithms. This is a bit awkward and we should fix the issue soon (GetStatement shouldn't trigger errors and thus shouldn't return nulls in this case).

Let us know if this helps in the mean time,
Pavel

PS. here's the plan you should see:

Slice(offset=0, limit=1000) [#1.0K]
`─ Projection(?p, ?o, ?edgeAttributeProperty, ?edgeAttributeObject) [#10K]
   `─ MergeJoin(?statement) [#10K]
      +─ Scan[SPO](?statement, ?edgeAttributeProperty, ?edgeAttributeObject) [#10K]
      `─ Sort(?statement) [#1]
         `─ Bind(COALESCE(GetStatement(:angola, ?p, ?o), "") AS ?statement) [#1]
            `─ Scan[SPOC](:angola, ?p, ?o) [#1]

Nice, thank you very much for the feedback.

FYI: Mine produced a DirectHashJoin:

Slice(offset=0, limit=1000) [#1.0K]
`─ Projection(?p, ?o, ?edgeAttributeProperty, ?edgeAttributeObject) [#24.6M]
   `─ DirectHashJoin(?statement) [#24.6M]
      +─ Bind(COALESCE(GetStatement(<https://graph.datastory.org/angola>, ?p, ?o), "") AS ?statement) [#1]
      │  `─ Scan[SPOC](<https://graph.datastory.org/angola>, ?p, ?o) [#1]
      `─ Scan[SPO](?statement, ?edgeAttributeProperty, ?edgeAttributeObject) [#24.6M]
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.