Duplicate query results from Virtual Graphs when using two (connected) bindings

Dear Stardog Community,

I am using a MySQL-Database with a single table and connect it as a virtual Graph to Stardog. The content of the table is:
id, name, location_id
'1', 'Example 1', 'Location 1'
'2', 'Example 2', 'Location 2'
'3', 'Example 3', 'Location 3'
'4', 'Example 4', 'Location 1'

As you see, the last location_id is the same as the first one.

My Mapping creates an instance of "Example" and one of "Location" and links them with the predicate "atLocation". This is the Mapping:

MAPPING
FROM SQL {
SELECT *
FROM example.example
}
TO {
?subject a :Example;
:id ?id ;
:name ?name ;
:atLocation ?location .

?location a :Location;
:id ?location_id .

} WHERE {
BIND(template("http://api.stardog.com/Example/id={id}") AS ?subject)
BIND(template("http://api.stardog.com/Location/id={location_id}") AS ?location)
}

If I run now the following select statement, it is giving me 10 results:

SELECT ?name ?locId {
GRAPH virtual://example {
?example a :Example ;
:name ?name ;
:atLocation ?location .

?location a :Location ;
        :id ?locId .

}
}
ORDER BY ?name

These are the results

name,locId
Example 1,Location 1
Example 1,Location 1
Example 1,Location 1
Example 1,Location 1
Example 2,Location 2
Example 3,Location 3
Example 4,Location 1
Example 4,Location 1
Example 4,Location 1
Example 4,Location 1

If I copy the data locally using COPY virtual://example TO http://stardog.com/Example and then running the select-statement without referring to the graph, but to the local data, I get the expected results:

name,locId
Example 1,Location 1
Example 2,Location 2
Example 3,Location 3
Example 4,Location 1

Is there something wrong with my mapping? Can you suggest what to do to get the expected results without needing to copy the data locally?

Thank you for your support!
Tim

Hello, and welcome to community!

Essentially when querying a virtual graph directly, depending on the mapping and underlying data, sometimes you will see duplicate results. If you were to add DISTINCT to your select query, you would see the same 4 results that you're expecting.

The reason that this doesn't happen after a COPY is that when Stardog is instructed to add a triple to a graph that already exists, it's simply a no-op. Basically the difference is that your VG mapping creates a collection of triples to query over, while adding them into Stardog itself creates a Set.

1 Like

Thank you so much for your help and explanation Stephen!

The problem is that there are so many triples added when working with larger tables that I run into timeouts. In my use case (that has the same structure like the one in the example), I get ~1500 results. In the local copy it finishes in less than 100ms, with the Virtual Graph it takes nearly 3 minutes. If I don't add "distinct" to the query, it returns more than 2,000,000 results, that's probably why it needs so long.

This is on the local copy:
image

This is on the Virtual Graph:

Do you have any suggestions on how I could speed that up?

Thank you!

I might recommend changing the SELECT * in your VG mapping to something more selective, i.e., only pulling the columns that you need for the mapping. You could also generate the query plan for the virtual query to see if the issue is with the SQL query that is ultimately being generated.

1 Like

Thank you so much for the suggestion Stephen!

I was going in more detail through the Query Plan and realized some problems. It was only accessing one table, but it was doing multiple joins over the same table and the same column.

So I made the following changes / improvements:

In my MySQL-DB:

  • I didn't set up the id columns (it's a combination of 3 columns) as primary key, but just as normal columns. So I changed them to form the primary key
  • I had the columns set up as text instead of VARCHAR (so Stardog always cast it to a VARCHAR(2048) which could have caused some performance problems)

In my Stardog Mapping:

  • I set up the Bindings over all ID columns (before I only used one of the ID field as part of the URI, so there were duplicates). That's how the mapping for my associated location looks now:

BIND(template("http://api.stardog.com/Location/LOCATION={LOCATION};COMMAND={COMMAND};RULE_ID={RULE_ID}") AS ?location)

I re-added Data-Source and Virtual Graphs after applying the changes on the DB. Now it is working like expected and giving me not even duplicate results. I hope that can help as reference for somebody running into similar issues.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.