Performance issue when querying Virtual Graphs

Hi,

I created a Virtual Graph for the Foursquare dataset, which is loaded in a PostegreSQL server. Basically, there is a table which contains around 11 million Points of Internets (PoIs). Here is the structure of the table:

CREATE TABLE public.pois
(
poiid text COLLATE pg_catalog."default",
lat numeric,
long numeric,
category text COLLATE pg_catalog."default",
countrycode character(2) COLLATE pg_catalog."default"
)

And here are the auto-generated mappings (ofc in our case we have our internal ontology):

PREFIX : <http://api.stardog.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX stardog: <tag:stardog:api:>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

MAPPING
FROM SQL {
SELECT *
FROM "public"."pois"
}
TO {
?subject <http://api.stardog.com/pois#category> ?category .
?subject <http://api.stardog.com/pois#countrycode> ?countrycode .
?subject <http://api.stardog.com/pois#lat> ?lat .
?subject <http://api.stardog.com/pois#long> ?long .
?subject <http://api.stardog.com/pois#poiid> ?poiid .
?subject rdf:type :pois
} WHERE {
BIND(template("pois{poiid}{lat}{long}{category}{countrycode}") AS ?subject)
BIND(xsd:double(?lat) AS ?lat)
BIND(xsd:double(?long) AS ?long)
}

From Stardog Studio, I am running the following query to retrieve all PoIs , since we need to do further processing on the client side. And this is only one of the simplest queries that we have to run, we need more expensive queries as well, where Joins from different tables and Graphs have to be used:

PREFIX : <http://api.stardog.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX stardog: <tag:stardog:api:>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select * {
GRAPH <virtual://foursquareVG> {
?subject rdf:type :pois;
<http://api.stardog.com/pois#category> ?category ;
<http://api.stardog.com/pois#countrycode> ?countrycode ;
<http://api.stardog.com/pois#lat> ?lat ;
<http://api.stardog.com/pois#long> ?long ;
<http://api.stardog.com/pois#poiid> ?poiid .
}
}

The query runs for 15 mins and does not retrieve anything. Below is the message that I got after I manually stopped it (killed):

And here is the query plan:

{
"prefixes": {
"": "http://api.stardog.com/",
"stardog": "tag:stardog:api:",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"xsd": "XML Schema",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#"
},
"dataset": {},
"plan": {
"label": "Projection(?subject, ?category, ?countrycode, ?lat, ?long, ?poiid)",
"cardinality": 1,
"children": [
{
"label": "VirtualGraphSql<virtual://foursquareVG> [#1]",
"children": [
{
"label": "SELECT "pois"."poiid", "pois"."lat", "pois"."long", "pois"."category", "pois"."countrycode"",
"children":
},
{
"label": "FROM "public"."pois"",
"children":
},
{
"label": "INNER JOIN "public"."pois" AS "pois0" ON "pois"."poiid" = "pois0"."poiid" AND "pois"."lat" = "pois0"."lat" AND "pois"."long" = "pois0"."long" AND "pois"."category" = "pois0"."category" AND "pois"."countrycode" = "pois0"."countrycode"",
"children":
},
{
"label": "INNER JOIN "public"."pois" AS "pois1" ON "pois"."poiid" = "pois1"."poiid" AND "pois"."lat" = "pois1"."lat" AND "pois"."long" = "pois1"."long" AND "pois"."category" = "pois1"."category" AND "pois"."countrycode" = "pois1"."countrycode" AND "pois0"."poiid" = "pois1"."poiid" AND "pois0"."lat" = "pois1"."lat" AND "pois0"."long" = "pois1"."long" AND "pois0"."category" = "pois1"."category" AND "pois0"."countrycode" = "pois1"."countrycode"",
"children":
},
{
"label": "INNER JOIN "public"."pois" AS "pois2" ON "pois"."poiid" = "pois2"."poiid" AND "pois"."lat" = "pois2"."lat" AND "pois"."long" = "pois2"."long" AND "pois"."category" = "pois2"."category" AND "pois"."countrycode" = "pois2"."countrycode" AND "pois0"."poiid" = "pois2"."poiid" AND "pois0"."lat" = "pois2"."lat" AND "pois0"."long" = "pois2"."long" AND "pois0"."category" = "pois2"."category" AND "pois0"."countrycode" = "pois2"."countrycode" AND "pois1"."poiid" = "pois2"."poiid" AND "pois1"."lat" = "pois2"."lat" AND "pois1"."long" = "pois2"."long" AND "pois1"."category" = "pois2"."category" AND "pois1"."countrycode" = "pois2"."countrycode"",
"children":
},
{
"label": "INNER JOIN "public"."pois" AS "pois3" ON "pois"."poiid" = "pois3"."poiid" AND "pois"."lat" = "pois3"."lat" AND "pois"."long" = "pois3"."long" AND "pois"."category" = "pois3"."category" AND "pois"."countrycode" = "pois3"."countrycode" AND "pois0"."poiid" = "pois3"."poiid" AND "pois0"."lat" = "pois3"."lat" AND "pois0"."long" = "pois3"."long" AND "pois0"."category" = "pois3"."category" AND "pois0"."countrycode" = "pois3"."countrycode" AND "pois1"."poiid" = "pois3"."poiid" AND "pois1"."lat" = "pois3"."lat" AND "pois1"."long" = "pois3"."long" AND "pois1"."category" = "pois3"."category" AND "pois1"."countrycode" = "pois3"."countrycode" AND "pois2"."poiid" = "pois3"."poiid" AND "pois2"."lat" = "pois3"."lat" AND "pois2"."long" = "pois3"."long" AND "pois2"."category" = "pois3"."category" AND "pois2"."countrycode" = "pois3"."countrycode"",
"children":
},
{
"label": "INNER JOIN "public"."pois" AS "pois4" ON "pois"."poiid" = "pois4"."poiid" AND "pois"."lat" = "pois4"."lat" AND "pois"."long" = "pois4"."long" AND "pois"."category" = "pois4"."category" AND "pois"."countrycode" = "pois4"."countrycode" AND "pois0"."poiid" = "pois4"."poiid" AND "pois0"."lat" = "pois4"."lat" AND "pois0"."long" = "pois4"."long" AND "pois0"."category" = "pois4"."category" AND "pois0"."countrycode" = "pois4"."countrycode" AND "pois1"."poiid" = "pois4"."poiid" AND "pois1"."lat" = "pois4"."lat" AND "pois1"."long" = "pois4"."long" AND "pois1"."category" = "pois4"."category" AND "pois1"."countrycode" = "pois4"."countrycode" AND "pois2"."poiid" = "pois4"."poiid" AND "pois2"."lat" = "pois4"."lat" AND "pois2"."long" = "pois4"."long" AND "pois2"."category" = "pois4"."category" AND "pois2"."countrycode" = "pois4"."countrycode" AND "pois3"."poiid" = "pois4"."poiid" AND "pois3"."lat" = "pois4"."lat" AND "pois3"."long" = "pois4"."long" AND "pois3"."category" = "pois4"."category" AND "pois3"."countrycode" = "pois4"."countrycode"",
"children":
}
]
}
]
}
}

Of course, when I run this huge query directly on the PostreSQL, it also takes time to finish, maybe similar to Stardog. The normal query which I would expect to be translated from SPARQL-to-SQL (maybe with some tiny additions) is:

SELECT poiid, lat, long, category, countrycode
FROM public.pois;

The above query takes only 16 seconds to be executed.

Considering the runtime transformations of the results to RDF, a kind of tiny delay is acceptable. But the endless execution time makes it impractical to use.

Thank you in advance.

Best,

I did try by changing some configuration settings of the Virtual Graph, basically I switched "Query Translation Algorithm" from "Default" to "Legacy", as you can see:

and now the query plan seems to be much better:

{
"prefixes": {
"": "http://api.stardog.com/",
"stardog": "tag:stardog:api:",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"xsd": "XML Schema",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#"
},
"dataset": {},
"plan": {
"label": "Projection(?subject, ?category, ?countrycode, ?lat, ?long, ?poiid)",
"cardinality": 7300000,
"children": [
{
"label": "VirtualGraphSqlvirtual://foursquareVG [#7335302]",
"children": [
{
"label": "SELECT *",
"children":
},
{
"label": "FROM "public"."pois"",
"children":
},
{
"label": "WHERE "lat" IS NOT NULL AND "long" IS NOT NULL AND "category" IS NOT NULL AND "countrycode" IS NOT NULL",
"children":
}
]
}
]
}
}

I still am not able to get any result when executing the SPARQL query as above. After waiting for approximately 5 min, Stardog Studio is automatically closed (crashed) and the error in the Stardog log file is:

ERROR 2019-05-22 10:59:55,797 [stardog-user-40] com.stardog.http.server.undertow.ErrorHandling:writeError(138): Unexpected error on the server
com.stardog.stark.query.io.ResultWritingFailed: java.io.IOException: Connection reset by peer
at com.stardog.stark.query.io.json.SPARQLJSONSelectResultWriter.handle(SPARQLJSONSelectResultWriter.java:98) ~[stardog-stark-query-json-6.1.3.jar:?]
at com.stardog.stark.query.io.json.SPARQLJSONSelectResultWriter.handle(SPARQLJSONSelectResultWriter.java:32) ~[stardog-stark-query-json-6.1.3.jar:?]
at com.stardog.stark.query.ClosingSpliterator.forEachRemaining(ClosingSpliterator.java:38) ~[stardog-stark-query-api-6.1.3.jar:?]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) ~[?:1.8.0_212]
at com.stardog.stark.query.io.QueryResultWriters.write(QueryResultWriters.java:142) ~[stardog-stark-query-io-6.1.3.jar:?]
at com.stardog.stark.query.io.QueryResultWriters.write(QueryResultWriters.java:127) ~[stardog-stark-query-io-6.1.3.jar:?]
at com.complexible.stardog.protocols.http.server.ProtocolUtils.writeTupleResponse(ProtocolUtils.java:579) ~[stardog-protocols-http-server-6.1.3.jar:?]
at com.complexible.stardog.protocols.http.server.ProtocolUtils.executeReadQuery(ProtocolUtils.java:489) ~[stardog-protocols-http-server-6.1.3.jar:?]
at com.complexible.stardog.protocols.http.server.ProtocolUtils.executeReadQuery(ProtocolUtils.java:474) ~[stardog-protocols-http-server-6.1.3.jar:?]
at com.complexible.stardog.protocols.http.server.SPARQLProtocol.executeQuery(SPARQLProtocol.java:127) ~[stardog-protocols-http-server-6.1.3.jar:?]
at com.complexible.stardog.protocols.http.server.SPARQLProtocol.post(SPARQLProtocol.java:91) ~[stardog-protocols-http-server-6.1.3.jar:?]
at com.stardog.http.server.undertow.jaxrs.ExtractRoutes.lambda$handleIt$5(ExtractRoutes.java:192) ~[stardog-protocols-http-server-6.1.3.jar:?]
at org.apache.shiro.subject.support.SubjectRunnable.doRun(SubjectRunnable.java:120) [shiro-core-1.2.3.jar:1.2.3]
at org.apache.shiro.subject.support.SubjectRunnable.run(SubjectRunnable.java:108) [shiro-core-1.2.3.jar:1.2.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_212]
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_212]
at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_212]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_212]
at org.xnio.nio.NioSocketConduit.write(NioSocketConduit.java:161) ~[xnio-nio-3.3.6.Final.jar:3.3.6.Final]
at io.undertow.server.protocol.http.HttpResponseConduit.write(HttpResponseConduit.java:645) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.conduits.ChunkedStreamSinkConduit.doWrite(ChunkedStreamSinkConduit.java:163) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.conduits.ChunkedStreamSinkConduit.write(ChunkedStreamSinkConduit.java:127) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.conduits.ChunkedStreamSinkConduit.write(ChunkedStreamSinkConduit.java:216) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.conduits.DeflatingStreamSinkConduit.performFlushIfRequired(DeflatingStreamSinkConduit.java:419) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.conduits.DeflatingStreamSinkConduit.write(DeflatingStreamSinkConduit.java:99) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at org.xnio.conduits.ConduitStreamSinkChannel.write(ConduitStreamSinkChannel.java:150) ~[xnio-api-3.3.6.Final.jar:3.3.6.Final]
at io.undertow.channels.DetachableStreamSinkChannel.write(DetachableStreamSinkChannel.java:240) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.server.HttpServerExchange$WriteDispatchChannel.write(HttpServerExchange.java:2049) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.io.UndertowOutputStream.writeBufferBlocking(UndertowOutputStream.java:293) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at io.undertow.io.UndertowOutputStream.write(UndertowOutputStream.java:203) ~[undertow-core-1.4.11.Final.jar:1.4.11.Final]
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_212]
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[?:1.8.0_212]
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:1.8.0_212]
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[?:1.8.0_212]
at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator._flushBuffer(WriterBasedJsonGenerator.java:1880) ~[jackson-core-2.8.2.jar:2.8.2]
at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator._writeString(WriterBasedJsonGenerator.java:917) ~[jackson-core-2.8.2.jar:2.8.2]
at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator.writeString(WriterBasedJsonGenerator.java:366) ~[jackson-core-2.8.2.jar:2.8.2]
at com.fasterxml.jackson.core.JsonGenerator._writeSimpleObject(JsonGenerator.java:1926) ~[jackson-core-2.8.2.jar:2.8.2]
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:381) ~[jackson-core-2.8.2.jar:2.8.2]
at com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1652) ~[jackson-core-2.8.2.jar:2.8.2]
at com.stardog.stark.query.io.json.AbstractSPARQLJSONWriter.writeValue(AbstractSPARQLJSONWriter.java:252) ~[stardog-stark-query-json-6.1.3.jar:?]
at com.stardog.stark.query.io.json.SPARQLJSONSelectResultWriter.handle(SPARQLJSONSelectResultWriter.java:92) ~[stardog-stark-query-json-6.1.3.jar:?]
... 16 more

Best,

Hi Lav,

The "query translation algorithm" is no longer used in Stardog 6. Are you certain nothing changed other than the option?

The reason for the joins is that there is no guarantee that rows with a given subject return a unique result. However, with all fields included in the subject template, we don't need to compute the cross product. This is something we are planning to change in an upcoming release.

Jess

Hi Jess,

yes, I am sure, nothing apart from that option changed.

Do you have an estimation how long this fix would take?

Best,

Can this table be altered such that poiid is either a primary key, or at least unique? The more such constraints in the table, the more Stardog is able to eliminate these joins.

-Paul

Hi Paul,

yes, I already made poiid as a primary key and I can confirm that after doing that, there are not more plenty of joins generated, even after I switch back the value of "Query Translation Algorithm" from "Legacy" to "Default".

I also experimented a bit with creating a small application using StardogJS library and querying outside of Stardog Studio. One of the problems that I encountered here is dealing with "JavaScript heap out of memory". I allocated more memory to this small app, and now it can handle with up to 1M results. Basically, in the meantime, we have to think for some workaround, e.g. doing some stream processing.

It seems that Stardog Studio whenever it is not able to handle a huge amount of results, in this case above 200K, it crashes or it shows a notification "Something went wrong".

Would be great to know, if you have plans for increasing query performance for Virtual Graphs.

Thank you very much for your support.

Best,

Lav,

We are continuously improving the virtual graph engine, both in terms of capability and performance.

Regarding your current situation, it be helpful to perform further troubleshooting. The "query translation property" is no longer used in Stardog 6. Are you able to export the virtual graph options before and after setting the query translation option so we can see if anything else has changed? This can be done using the virtual options CLI command.

Jess

Hi Jess,

here are the respective exported options of the virtual graph, where query.translation is shown as well:

#Configuration options for the virtual graph foursquareVG
#Fri May 24 16:07:55 CEST 2019
query.translation=DEFAULT
namespaces==http://api.stardog.com/\u0002stardog=tag:stardog:api:\u0002owl=http://www.w3.org/2002/07/owl#\u0002rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0002xsd=http://www.w3.org/2001/XMLSchema#\u0002fn=http://www.w3.org/2005/xpath-functions#\u0002rdfs=http://www.w3.org/2000/01/rdf-schema#
percent.encode=true
jdbc.password=pwdTest
mappings.syntax=SMS2
parser.sql.quoting=NATIVE
jdbc.url=jdbc:postgresql://10.22.9.111/foursquare
optimize.import=true
jdbc.driver=org.postgresql.Driver
jdbc.username=userTest

#Configuration options for the virtual graph foursquareVG
#Fri May 24 16:09:28 CEST 2019
query.translation=LEGACY
namespaces==http://api.stardog.com/\u0002stardog=tag:stardog:api:\u0002owl=http://www.w3.org/2002/07/owl#\u0002rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0002xsd=http://www.w3.org/2001/XMLSchema#\u0002fn=http://www.w3.org/2005/xpath-functions#\u0002rdfs=http://www.w3.org/2000/01/rdf-schema#
percent.encode=true
jdbc.password=pwdTest
mappings.syntax=SMS2
parser.sql.quoting=NATIVE
jdbc.url=jdbc:postgresql://10.22.9.111/foursquare
optimize.import=true
jdbc.driver=org.postgresql.Driver
jdbc.username=userTest

Best,