Slow VG query with Postgres

I have a slow VG query against Postgres that takes 4796ms where the equivalent SQL query takes 1.583ms. The estimate seems way off. The cardinality for the SQL query is only 4303595

The SPARQL query is

SELECT * WHERE { <http://semantalyitcs.com/2020/06/id/geolite2/network/1.0.0.0%2F24> ?p ?o }

Equivalent SQL query

         SELECT
                COALESCE(a.network, ci.network, co.network) AS network,
                a.ip_version,
                a.autonomous_system_number,
                a.autonomous_system_organization,
                COALESCE(ci.ip_version, co.ip_version) AS ip_version,
                ci.geoname_id AS city_geoname_id,
                COALESCE(ci.is_anonymous_proxy, co.is_anonymous_proxy) AS is_anonymous_proxy,
                COALESCE(ci.is_satellite_provider, co.is_satellite_provider) AS is_satellite_provider,
                ci.postal_code,
                ci.latitude,
                ci.longitude,
                ci.accuracy_radius,
                co.geoname_id AS country_geoname_id,
                COALESCE(ci.registered_country_geoname_id, co.registered_country_geoname_id) AS registered_country_geoname_id,
                COALESCE(ci.represented_country_geoname_id, co.represented_country_geoname_id) AS represented_country_geoname_id
        FROM
                asn_blocks a
        FULL OUTER JOIN
                city_blocks ci
        ON
                a.network = ci.network
        FULL OUTER JOIN
                country_blocks co
        ON
                a.network = co.network
        WHERE
                a.network = '1.0.0.0/24';

The query plan is

From local
Projection(?p, ?o) [#2115376516.1M]
`─ Filter((Bound(?p) && Bound(?o))) [#2115376516.1M]
   `─ Projection(?p_accuracyRadius AS ?p, ?o_accuracyRadius AS ?o;
      +─         ?p_autonomousSystem AS ?p, ?o_autonomousSystem AS ?o;
      +─         ?p_cityLocation AS ?p, ?o_cityLocation AS ?o;
      +─         ?p_countryLocation AS ?p, ?o_countryLocation AS ?o;
      +─         ?p_isAnonymousProxy AS ?p, ?o_isAnonymousProxy AS ?o;
      +─         ?p_isSatelliteProvider AS ?p, ?o_isSatelliteProvider AS ?o;
      +─         ?p_latitude AS ?p, ?o_latitude AS ?o;
      +─         ?p_longitude AS ?p, ?o_longitude AS ?o;
      +─         ?p_network AS ?p, ?o_network AS ?o;
      +─         ?p_registeredCountry AS ?p, ?o_registeredCountry AS ?o;
      +─         ?p_representedCountry AS ?p, ?o_representedCountry AS ?o;
      +─         ?p_version AS ?p, ?o_version AS ?o;
      +─         ?p_type AS ?p, ?o_type AS ?o) [#4230753032.2M]
      `─ VirtualGraphSql<virtual://geolite2> [#2147483647] {
         +─ RelNode=
         +─    LogicalProject(network=[CASE(IS NOT NULL($0), $0, IS NOT NULL($4), $4, $16)], ip_version=[$1], autonomous_system_number=[$2], city_geoname_id=[$6], is_anonymous_proxy=[OR(AND(IS NOT NULL($9), $9), AND($21, IS NULL($9)))], is_satellite_provider=[OR(AND(IS NOT NULL($10), $10), AND($22, IS NULL($10)))], latitude=[$12], longitude=[$13], accuracy_radius=[$14], country_geoname_id=[$18], registered_country_geoname_id=[CASE(IS NOT NULL($7), $7, $19)], represented_country_geoname_id=[CASE(IS NOT NULL($8), $8, $20)])
         +─      LogicalFilter(condition=[OR(=($0, '1.0.0.0/24'), AND(=($4, '1.0.0.0/24'), IS NULL($0)), AND(=($16, '1.0.0.0/24'), IS NULL($0), IS NULL($4)))])
         +─        LogicalJoin(condition=[=($0, $16)], joinType=[full])
         +─          LogicalJoin(condition=[=($0, $4)], joinType=[full])
         +─            JdbcTableScan(table=[[public, asn_blocks]])
         +─            JdbcTableScan(table=[[public, city_blocks]])
         +─          JdbcTableScan(table=[[public, country_blocks]])
         +─ Vars=
         +─    ?p_accuracyRadius <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/accuracyRadius)
         +─    ?o_accuracyRadius <- COLUMN($8)^^xsd:string
         +─    ?p_autonomousSystem <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/autonomousSystem)
         +─    ?o_autonomousSystem <- TEMPLATE(http://semantalytics.com/2020/06/id/geolite2/autonomous-system-organization/{autonomous_system_number/2})
         +─    ?p_cityLocation <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/cityLocation)
         +─    ?o_cityLocation <- TEMPLATE(http://sws.geonames.org/{city_geoname_id/3}/)
         +─    ?p_countryLocation <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/countryLocation)
         +─    ?o_countryLocation <- TEMPLATE(http://sws.geonames.org/{country_geoname_id/9}/)
         +─    ?p_isAnonymousProxy <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/isAnonymousProxy)
         +─    ?o_isAnonymousProxy <- COLUMN($4)^^xsd:boolean
         +─    ?p_isSatelliteProvider <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/isSatelliteProvider)
         +─    ?o_isSatelliteProvider <- COLUMN($5)^^xsd:boolean
         +─    ?p_latitude <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/latitude)
         +─    ?o_latitude <- COLUMN($6)^^xsd:decimal
         +─    ?p_longitude <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/longitude)
         +─    ?o_longitude <- COLUMN($7)^^xsd:decimal
         +─    ?p_network <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/network)
         +─    ?o_network <- COLUMN($0)^^xsd:string
         +─    ?p_registeredCountry <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/registeredCountry)
         +─    ?o_registeredCountry <- TEMPLATE(http://sws.geonames.org/{registered_country_geoname_id/10}/)
         +─    ?p_representedCountry <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/representedCountry)
         +─    ?o_representedCountry <- TEMPLATE(http://sws.geonames.org/{represented_country_geoname_id/11}/)
         +─    ?p_version <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/version)
         +─    ?o_version <- COLUMN($1)^^xsd:string
         +─    ?p_type <- CONSTANT(http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
         +─    ?o_type <- CONSTANT(http://semantalytics.com/2020/06/ns/geolite2/InternetProtocolNetwork)
         }

and the mapping is

PREFIX ip: <http://semantalytics.com/2020/06/ns/geolite2/>

MAPPING FROM SQL {
	SELECT
		COALESCE(a.network, ci.network, co.network) AS network,
		a.ip_version,
		a.autonomous_system_number,
		a.autonomous_system_organization,
		COALESCE(ci.ip_version, co.ip_version) AS ip_version,
		ci.geoname_id AS city_geoname_id,
		COALESCE(ci.is_anonymous_proxy, co.is_anonymous_proxy) AS is_anonymous_proxy,
		COALESCE(ci.is_satellite_provider, co.is_satellite_provider) AS is_satellite_provider,
		ci.postal_code,
		ci.latitude,
		ci.longitude,
		ci.accuracy_radius,
		co.geoname_id AS country_geoname_id,
		COALESCE(ci.registered_country_geoname_id, co.registered_country_geoname_id) AS registered_country_geoname_id,
		COALESCE(ci.represented_country_geoname_id, co.represented_country_geoname_id) AS represented_country_geoname_id
	FROM
		asn_blocks a
	FULL OUTER JOIN
		city_blocks ci
	ON
		a.network = ci.network
	FULL OUTER JOIN
		country_blocks co
	ON
		a.network = co.network
} TO {
	?networkIri
		a ip:InternetProtocolNetwork;
		ip:version ?ip_version;
		ip:autonomousSystem ?autonomousSystemIri;
		ip:network ?network;
		ip:version ?ip_version;
		ip:cityLocation ?cityGeonameIdIri;
		ip:countryLocation ?countryGeonameIdIri;
		ip:registeredCountry ?registeredCountryGeonameIdIri;
		ip:representedCountry ?representedCountryGeonameIdIri;
		ip:isAnonymousProxy ?is_anonymous_proxy_bool;
		ip:isSatelliteProvider ?is_satellite_provider_bool;
		ip:latitude ?latitude_decimal;
		ip:longitude ?longitude_decimal;
		ip:accuracyRadius ?accuracy_radius;
		ip:version ?ip_version;
	.
	?autonomousSystemIri
		ip:autonomousSystemNumber ?autonomous_system_number;
		ip:autonomousSystemOrganization ?autonomous_system_organization;
	.	
} WHERE {
	BIND(template("http://semantalytics.com/2020/06/id/geolite2/network/{network}") AS ?networkIri)
	BIND(template("http://semantalytics.com/2020/06/id/geolite2/autonomous-system-organization/{autonomous_system_number}") AS ?autonomousSystemI
ri)
	BIND(template("http://sws.geonames.org/{country_geoname_id}/") AS ?countryGeonameIdIri)
	BIND(template("http://sws.geonames.org/{city_geoname_id}/") AS ?cityGeonameIdIri)
	BIND(template("http://sws.geonames.org/{registered_country_geoname_id}/") AS ?registeredCountryGeonameIdIri)
	BIND(template("http://sws.geonames.org/{represented_country_geoname_id}/") AS ?representedCountryGeonameIdIri)
	BIND(xsd:boolean(?is_satellite_provider) AS ?is_satellite_provider_bool)
	BIND(xsd:boolean(?is_anonymous_proxy) AS ?is_anonymous_proxy_bool)
	BIND(xsd:decimal(?latitude) AS ?latitude_decimal)
	BIND(xsd:decimal(?longitude) AS ?longitude_decimal)
}

The cardinality estimate shouldn't be an issue here. You can try running analyze on those tables in your Postgres server to see if it changes.

How many results does the SQL query return?

Can you setting explain.format=NATIVE_QUERY in your VG props? This will show the SQL query in the Stardog plan. Can you run that SQL query? It should be exactly the same as what you're showing.

I run vaccum analyze right after loading the data so I don't expect it to change but I'll give it a try.

The SQL query returns 4,303,595 rows.

It looks like the query it's executing takes 4987.585ms out of a total 5266ms

The SQL query generated is

SELECT CASE
           WHEN "asn_blocks"."network" IS NOT NULL THEN "asn_blocks"."network"
           WHEN "city_blocks"."network" IS NOT NULL THEN "city_blocks"."network"
           ELSE "country_blocks"."network" END                                                    AS "network",
       "asn_blocks"."ip_version",
       "asn_blocks"."autonomous_system_number",
       "city_blocks"."geoname_id"                                                                 AS "city_geoname_id",
       "city_blocks"."is_anonymous_proxy" IS NOT NULL AND "city_blocks"."is_anonymous_proxy" OR
       "country_blocks"."is_anonymous_proxy" AND
       "city_blocks"."is_anonymous_proxy" IS NULL                                                 AS "is_anonymous_proxy",
       "city_blocks"."is_satellite_provider" IS NOT NULL AND "city_blocks"."is_satellite_provider" OR
       "country_blocks"."is_satellite_provider" AND
       "city_blocks"."is_satellite_provider" IS NULL                                              AS "is_satellite_provider",
       "city_blocks"."latitude",
       "city_blocks"."longitude",
       "city_blocks"."accuracy_radius",
       "country_blocks"."geoname_id"                                                              AS "country_geoname_id",
       CASE
           WHEN "city_blocks"."registered_country_geoname_id" IS NOT NULL
               THEN "city_blocks"."registered_country_geoname_id"
           ELSE "country_blocks"."registered_country_geoname_id" END                              AS "registered_country_geoname_id",
       CASE
           WHEN "city_blocks"."represented_country_geoname_id" IS NOT NULL
               THEN "city_blocks"."represented_country_geoname_id"
           ELSE "country_blocks"."represented_country_geoname_id" END                             AS "represented_country_geoname_id"
FROM "public"."asn_blocks"
         FULL JOIN "public"."city_blocks" ON "asn_blocks"."network" = "city_blocks"."network"
         FULL JOIN "public"."country_blocks" ON "asn_blocks"."network" = "country_blocks"."network"
WHERE "asn_blocks"."network" = '1.0.0.0/24'
   OR "city_blocks"."network" = '1.0.0.0/24' AND "asn_blocks"."network" IS NULL
   OR "country_blocks"."network" = '1.0.0.0/24' AND "asn_blocks"."network" IS NULL AND "city_blocks"."network" IS NULL

Ok, thanks that helps. I missed that weird join in the other view.

Can you share the Postgres plan for the Stardog-generated query?

The Postgres query plan for the Stardog generated SQL

 Hash Full Join  (cost=40586.29..382492.61 rows=5 width=101)
   Hash Cond: ((asn_blocks.network)::text = (country_blocks.network)::text)
   Filter: (((asn_blocks.network)::text = '1.0.0.0/24'::text) OR (((city_blocks.network)::text = '1.0.0.0/24'::text) AND (asn_blocks.network IS NULL)) OR (((country_blocks.network)::text = '1.0.0.0/24'::te
xt) AND (asn_blocks.network IS NULL) AND (city_blocks.network IS NULL)))
   ->  Hash Full Join  (cost=24126.09..238766.40 rows=2620585 width=71)
         Hash Cond: ((city_blocks.network)::text = (asn_blocks.network)::text)
         ->  Seq Scan on city_blocks  (cost=0.00..127325.85 rows=2620585 width=46)
         ->  Hash  (cost=14374.15..14374.15 rows=504315 width=25)
               ->  Seq Scan on asn_blocks  (cost=0.00..14374.15 rows=504315 width=25)
   ->  Hash  (cost=8620.31..8620.31 rows=405431 width=30)
         ->  Seq Scan on country_blocks  (cost=0.00..8620.31 rows=405431 width=30)
(10 rows)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.