Stardog Cloud (Explorer and Studio) - SPARQL query error

In Stardog Cloud, I am working on an altered demo based of the one here:
https://cloud.stardog.com/kits/stardog:training-c360:1.0

I downloaded the data from the git repo and uploaded it to Azure and Databricks (more detail below). For the purposes of my demo, I'm using a simplified model:

One of the main changes is virtualizing the data across Azure and Databricks (well, Azure Databricks) databases (a feature I'd like to highlight to our customer). Specifically, Customer and Product are supplied by an Azure SQL server, while the others are supplied by a Databricks SQL warehouse.

I am able to do simple queries without issue, but if the complexity increases there are intermittent issues or failures.

In terms of intermittent issues, this query fails-or-succeeds (roughly) equally in Stardog Explorer.

With the failure of:
image

If instead I run this query in Stardog Studio, it succeeds without issue (I made a few alternations to the SPARQL code, for display purposes, but if I run the SPARQL query generated via saving in Stardog Explorer, it succeeds all the same):

Well that's fine - I thought to myself - I'll simply showcase Stardog Studio here. But...if the query becomes a little more complex, the failure returns:

Here's the copied text (possible to copy in Studio, not possible in Explorer):

"Failed to run query: com.complexible.stardog.plan.eval.operator.OperatorException: Unable to execute virtual graph query. SQL string: SELECT `us_locations`.`id`, `purchases`.`id` `id0`, `purchases`.`price`, `purchases1`.`pid` `pid1`, `purchases2`.`cid` `cid2`
FROM `jamie_costabile`.`stardog`.`us_locations`
INNER JOIN `jamie_costabile`.`stardog`.`purchases` ON `purchases`.`price` > '10000' AND `us_locations`.`state` = 'Texas' AND `us_locations`.`id` IS NOT NULL
INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases0` ON `purchases`.`id` = `purchases0`.`id`
INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases1` ON `purchases`.`id` = `purchases1`.`id` AND `purchases1`.`pid` IS NOT NULL
INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases2` ON `purchases`.`id` = `purchases2`.`id` AND `purchases2`.`cid` IS NOT NULL"

Also, at times the query above executes for several minutes and terminates with a timeout error:

Failed to run query: com.complexible.stardog.plan.eval.operator.OperatorException$Cancelled: Query execution cancelled: Execution time exceeded query timeout 300000

Which leaves me with two questions:

  1. Possible reasons for the error (particularly in Stardog Studio) and potential solutions?
  2. Is there a way to access the stardog.log file from Stardog Cloud?

Thank you in advance!

A successful query occurred a moment ago, the first:

The situation continues:

Hi Jamie,

The only reasons I can imagine for the error occurring in Explorer but not Studio are:

  1. The SPARQL queries could be different somehow
    reasoning is enabled in Explorer but not Studio, or
    the active graphs are different
  2. The timeouts are different

We do not make the Stardog log available from the cloud - abundance of caution w.r.t. security.

A couple ways you might get more information

  1. The server does return the stack trace to the client - recent enhancement, not yet exposed in the client. If your browser supports it, you can try opening up dev tools (F12 in Chrome) and examine the response, the stack trace should be in there.
  2. You can go to the Databricks SQL workspace and look at the SQL query history. There may be a better explanation for why the query is failing.

If these are timeouts, you can make your query more efficient by adding the unique.key.sets Data Source option assuming purchases.id is a primary key:
unique.key.sets=(jamie_costabile.stardog.purchases.id)

Regards,
Paul Jackson

Jamie,

As a follow-up, could you share your (modified to protect privacy) jbdb connection string, along with any ext. options that affect the driver? I'm particularly interested in two parts. In this example:

jdbc.url=jdbc:databricks://adb-123456.azuredatabricks.net:443/default;UseNativeQuery=1;transportMode=http;ssl=1;httpPath=/sql/1.0/endpoints/8acd42c2a4

... the jdbc:databricks instead of jdbc:spark and the presence of UseNativeQuery=1. The latter prevents the jdbc driver from rewriting your query. I looked in your log and saw an error because of a CROSS JOIN when the query that was submitted had inner joins.

-Paul

Hi Paul,

Thank you for your replies. I am still working on addressing your first post, but I can respond to your second post (i.e., jdbc connection string details):

The current connector is a Simba Spark SQL connector (I used this because when building I was unable to get the Databricks Spark SQL to connect properly). Here are the "functioning" connection details:
jdbc:spark://adb-12345.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/random_numbers_letters;


If instead I use a Databricks Spark SQL connector, here are the connection details that successfully connects:
[0] jdbc:databricks://adb-12345.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/random_numbers_letters;

Here are two alternatives that failed to make a connection:
[1] jdbc:databricks://adb-12345.azuredatabricks.net:443/default;UseNativeQuery=1;transportMode=http;ssl=1;httpPath=/sql/1.0/warehouses/random_numbers_letters;

Error: Failed to update Data Source "Databricks-C360-Demo-Database": 000012: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown. (Internal Server Error)

[2] jdbc:databricks://adb-12345.azuredatabricks.net:443/default;UseNativeQuery=1;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/random_numbers_letters;

Error: Failed to update Data Source "Databricks-C360-Demo-Database": 000012: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown. (Internal Server Error)


NOTE: I attempted endpoints as well as warehouses though the error remained:
Failed to update Data Source "Databricks-C360-Demo-Database": 000012: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown. (Internal Server Error)

In Stardog Designer, when attempting to connect to the Databricks Spark SQL database from the [0] connection details, an error populates:

Problem querying data source: Data Source query failed: 'Databricks-C360-Demo-Database': Exception while executing a Data Source query. com.complexible.stardog.StardogException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: adb-12345.azuredatabricks.net:443 failed to respond. [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: adb-12345.azuredatabricks.net:443 failed to respond. [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: adb-12345.azuredatabricks.net:443 failed to respond. [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: adb-12345.azuredatabricks.net:443 failed to respond. com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: adb-12345.azuredatabricks.net:443 failed to respond adb-12345.azuredatabricks.net:443 failed to respond (400)

That's really strange. I just did some tests and all your variations work for me: swapping endpoints for warehouses, treating AuthMech=3 as optional, semicolon at the end. In your tests it seems if the only thing you change is adding UseNativeQuery=1 it will not connect. I've never seen that. It's really strange. Can you confirm there are no other ext. properties set? Just to be super explicit - the random_numbers_letters are the same for all attempts (no offense :). Are these successes and failures consistent across successive retries?

-Paul

Yes, it seems that adding the UseNativeQuery=1 causes the issue when altering the Databricks JDBC connection string. I repeated strings [1] and [2] from above a moment ago with the same result (i.e., including UseNativeQuery=1 plus including or excluding AuthMech=3 without other modifications to JDBC connection string fails to connect).

Additionally, the warehouse numbers from each JDBC connection string have been identical (no offense taken, I appreciate your thoroughness).


I don't set any ext. properties, only a SQL schema:


I checked the Query history in Databricks, and I am receiving a similar error to you, Syntax error at or near 'CROSS'.line 19, pos 2. Below is a slightly re-written query to highlight the CROSS issue:

If I alter the query, aggregating the CROSS joins, the query is successful:

NOTE: I've been deleting my last name from the Databricks SQL table name for privacy in previous images, however out of a bit of laziness its included here. Although I've deleted the last name from previous messages it has been after the fact of the errors I showed and strictly done for taking screenshots.


Also, with respect to the error in Stardog Designer: I compared the Simba Spark SQL JDBC connection string to the Databricks Spark SQL JDBC connection string, and aside from the spark and databricks they are identical:

jdbc:spark://adb-12345.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/35f3e70a---dc104; <-- Connects and can load resources into Stardog Designer

jdbc:databricks://adb-12345.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/35f3e70a---dc104; <-- Connects but fails to load resources into Stardog Designer

Update - The queries worked appropriately if I host all the data on a Microsoft SQL Server and use the associated connector. Below are two example queries that routinely failed if data were accessed from a Databricks Spark SQL connector (in tandem with a Microsoft SQL server connector; I used both to highlight the different connectors available).

(1)

(2)

I have a project beginning soon (next week, but development will likely begin a week or so after) that will use Stardog and, likely, Databricks. Any updates on the source of the issue or solutions to it? Thanks!

Hi Jamie, sorry for the delay. I didn't get a notification for some reason that I'll investigate.

When you say you are using cloud - are you referring to our cloud apps (Studio, Designer, etc) or the hosted service? IOW, where is Stardog running? I can't understand why the useNativeQuery option is preventing a connection. Starting to suspect that you're using a different driver, which would be impossible with the hosted service.

Thanks,
-Paul

Hi Paul, thanks for the feedback. I am using the Stardog Cloud service (i.e., browser interface tools) and, aside from the data tables stored elsewhere, all the content I use for these knowledge graphs is stored, accessed, and executed on Stardog Cloud.

In terms of drivers:
[1] For Databricks Spark SQL connection: com.databricks.client.jdbc.Driver
[2] For Simba Spark SQL connection: com.simba.spark.jdbc.Driver

That said I haven't tried useNativeQuery=1 with Simba Spark SQL (since our conversation began, I switched over to Databricks Spark SQL). A moment ago I added useNativeQuery=1 to the Simba Spark SQL connection string without success:

I think I might know what's going on. You're getting a 401 - authorization failure.

Looking at your image it looks like 4 characters are being hidden for password but 4 stars are not being hidden for username.

When you bring up the settings - it loads them from the server and the credentials are masked in the response. The screen isn't hiding your correct credentials, it has the wrong credentials. You have to re-enter them every time you make a change. I realize this stinks. We have a fix planned, but it's not available yet.

Huzzah! That did the trick (i.e., re-entering credentials). Sorry for the oversight! I'm glad I sent that image in the last post.

Excellent. And did that resolve the cross join problem?

Well, I can't say that for sure. In rerunning the query from before, the code no longer uses cross joins but inner joins:

Additionally if I run the query from the initial post, I receive different outputs (see image below). The left portion of the image is from Databricks. Here, I copied the SQL query from 'Query Histories' after executing the SPARQL code displayed on the right portion of the image. The SPARQL query only returns one entry (the rest are cut off, but also not there), while the SQL query returns 57 entries. Any thoughts here?

In the image below are the four SQL queries executed in Databricks upon running the SPARQL query shown in the image above (i.e., right portion):

Hi Jamie,

That's good progress. For the query you're working on now, could you do a "show plan" and select text mode? It's probable that the query plan has a service join, which can result in multiple SQL queries for a single SPARQL query. If that's the case, you'll see differences in the individual Databricks queries when compared to the SPARQL query. I'm speculating, but the plan is a good next step.

-Paul

Hi Paul,

Apologies in my delay in replying (work kept getting in the way). Here is the query plan (I am not sure how to set text mode):

prefix : <tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:model:>
prefix xsd: <http://www.w3.org/2001/XMLSchema>

From virtual
Slice(offset=0, limit=1000) [#475]
`─ OrderBy(DESC(?price), offset=0, limit=1000) [#475]
   `─ Distinct [#475]
      `─ Projection(?full_name, ?city, ?price, ?product_name) [#475]
         `─ Bind("Texas" AS ?state) sortedBy=?state [#475]
            `─ HashJoin(?customer) [#475]
               +─ Union [#860]
               │  +─ Union [#360]
               │  │  +─ Union [#270]
               │  │  │  +─ Union [#180]
               │  │  │  │  +─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
               │  │  │  │  │  +─ RelNode=
               │  │  │  │  │  +─    LogicalProject(cid=[$1])
               │  │  │  │  │  +─      LogicalFilter(condition=[IS NOT NULL($1)])
               │  │  │  │  │  +─        JdbcTableScan(table=[[jamie_costabile, stardog, rewards]])
               │  │  │  │  │  +─ Query=
               │  │  │  │  │  +─    SELECT `cid`
               │  │  │  │  │  +─    FROM `jamie_costabile`.`stardog`.`rewards`
               │  │  │  │  │  +─    WHERE `cid` IS NOT NULL
               │  │  │  │  │  +─ Vars=
               │  │  │  │  │  +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
               │  │  │  │  │  } [#90]
               │  │  │  │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
               │  │  │  │     +─ RelNode=
               │  │  │  │     +─    LogicalProject(cid=[$1])
               │  │  │  │     +─      LogicalFilter(condition=[IS NOT NULL($1)])
               │  │  │  │     +─        JdbcTableScan(table=[[jamie_costabile, stardog, creditcards]])
               │  │  │  │     +─ Query=
               │  │  │  │     +─    SELECT `cid`
               │  │  │  │     +─    FROM `jamie_costabile`.`stardog`.`creditcards`
               │  │  │  │     +─    WHERE `cid` IS NOT NULL
               │  │  │  │     +─ Vars=
               │  │  │  │     +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
               │  │  │  │     } [#90]
               │  │  │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
               │  │  │     +─ RelNode=
               │  │  │     +─    LogicalProject(cid=[$0])
               │  │  │     +─      LogicalFilter(condition=[IS NOT NULL($0)])
               │  │  │     +─        JdbcTableScan(table=[[jamie_costabile, stardog, us_customers]])
               │  │  │     +─ Query=
               │  │  │     +─    SELECT `cid`
               │  │  │     +─    FROM `jamie_costabile`.`stardog`.`us_customers`
               │  │  │     +─    WHERE `cid` IS NOT NULL
               │  │  │     +─ Vars=
               │  │  │     +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
               │  │  │     } [#90]
               │  │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
               │  │     +─ RelNode=
               │  │     +─    LogicalProject(cid=[$6])
               │  │     +─      LogicalFilter(condition=[IS NOT NULL($6)])
               │  │     +─        JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
               │  │     +─ Query=
               │  │     +─    SELECT `cid`
               │  │     +─    FROM `jamie_costabile`.`stardog`.`purchases`
               │  │     +─    WHERE `cid` IS NOT NULL
               │  │     +─ Vars=
               │  │     +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
               │  │     } [#90]
               │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Azure-C360-Demo-Database> [#500] {
               │     +─ RelNode=
               │     +─    LogicalProject(cid=[$0])
               │     +─      JdbcTableScan(table=[[stardog_demo, c360, Customers]])
               │     +─ Query=
               │     +─    SELECT [cid]
               │     +─    FROM [stardog_demo].[c360].[Customers]
               │     +─ Vars=
               │     +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
               │     } [#500]
               `─ HashJoin(?address) [#280]
                  +─ Union [#630]
                  │  +─ Union [#540]
                  │  │  +─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__TJC-Databricks> [#90] {
                  │  │  │  +─ RelNode=
                  │  │  │  +─    LogicalProject(location=[$47])
                  │  │  │  +─      LogicalFilter(condition=[IS NOT NULL($47)])
                  │  │  │  +─        JdbcTableScan(table=[[hive_metastore, temp, t_hco_to_delete]])
                  │  │  │  +─ Query=
                  │  │  │  +─    SELECT `location`
                  │  │  │  +─    FROM `hive_metastore`.`temp`.`t_hco_to_delete`
                  │  │  │  +─    WHERE `location` IS NOT NULL
                  │  │  │  +─ Vars=
                  │  │  │  +─    ?address <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.us_locations:{location/0})
                  │  │  │  } [#90]
                  │  │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Azure-C360-Demo-Database> [#450] {
                  │  │     +─ RelNode=
                  │  │     +─    LogicalProject(location=[$6])
                  │  │     +─      LogicalFilter(condition=[IS NOT NULL($6)])
                  │  │     +─        JdbcTableScan(table=[[stardog_demo, c360, Customers]])
                  │  │     +─ Query=
                  │  │     +─    SELECT [location]
                  │  │     +─    FROM [stardog_demo].[c360].[Customers]
                  │  │     +─    WHERE [location] IS NOT NULL
                  │  │     +─ Vars=
                  │  │     +─    ?address <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.us_locations:{location/0})
                  │  │     } [#450]
                  │  `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
                  │     +─ RelNode=
                  │     +─    LogicalProject(id=[$0])
                  │     +─      LogicalFilter(condition=[IS NOT NULL($0)])
                  │     +─        JdbcTableScan(table=[[jamie_costabile, stardog, us_locations]])
                  │     +─ Query=
                  │     +─    SELECT `id`
                  │     +─    FROM `jamie_costabile`.`stardog`.`us_locations`
                  │     +─    WHERE `id` IS NOT NULL
                  │     +─ Vars=
                  │     +─    ?address <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.us_locations:{id/0})
                  │     } [#90]
                  `─ HashJoin(?customer) [#179]
                     +─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Azure-C360-Demo-Database> [#405] {
                     │  +─ RelNode=
                     │  +─    LogicalProject(cid=[$0], location=[$6], full_name=[$7])
                     │  +─      LogicalFilter(condition=[AND(IS NOT NULL($6), IS NOT NULL($7))])
                     │  +─        JdbcTableScan(table=[[stardog_demo, c360, Customers]])
                     │  +─ Query=
                     │  +─    SELECT [cid], [location], [full_name]
                     │  +─    FROM [stardog_demo].[c360].[Customers]
                     │  +─    WHERE [location] IS NOT NULL AND [full_name] IS NOT NULL
                     │  +─ Vars=
                     │  +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/0})
                     │  +─    ?address <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.us_locations:{location/1})
                     │  +─    ?full_name <- COLUMN($2)^^xsd:string
                     │  } [#405]
                     `─ HashJoin(?product) [#180]
                        +─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#222] {
                        │  +─ RelNode=
                        │  +─    LogicalProject(id=[$0], id1=[$18], product=[$19], cid1=[$25], phone=[$30], price2=[$47], id4=[$51], city=[$57])
                        │  +─      LogicalJoin(condition=[=($25, $73)], joinType=[inner])
                        │  +─        LogicalJoin(condition=[=($25, $65)], joinType=[inner])
                        │  +─          LogicalJoin(condition=[AND(=($51, $58), =($62, 'Texas'))], joinType=[inner])
                        │  +─            LogicalJoin(condition=[AND(IS NOT NULL($51), IS NOT NULL($57))], joinType=[inner])
                        │  +─              LogicalJoin(condition=[AND(=($0, $42), >($47, '10000':VARCHAR(2048)))], joinType=[inner])
                        │  +─                LogicalJoin(condition=[AND(=($0, $33), =($25, $39))], joinType=[inner])
                        │  +─                  LogicalJoin(condition=[IS NOT NULL($30)], joinType=[inner])
                        │  +─                    LogicalJoin(condition=[AND(=($10, $18), =('highlighters':VARCHAR(2048), LOWER($19)), IS NOT NULL($19))], joinType=[inner])
                        │  +─                      LogicalJoin(condition=[=($0, $9)], joinType=[inner])
                        │  +─                        JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
                        │  +─                        JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
                        │  +─                      JdbcTableScan(table=[[jamie_costabile, stardog, products]])
                        │  +─                    JdbcTableScan(table=[[jamie_costabile, stardog, us_customers]])
                        │  +─                  JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
                        │  +─                JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
                        │  +─              JdbcTableScan(table=[[jamie_costabile, stardog, us_locations]])
                        │  +─            JdbcTableScan(table=[[jamie_costabile, stardog, us_locations]])
                        │  +─          JdbcTableScan(table=[[jamie_costabile, stardog, us_customers]])
                        │  +─        JdbcTableScan(table=[[jamie_costabile, stardog, us_customers]])
                        │  +─ Query=
                        │  +─    SELECT `purchases`.`id`, `products`.`id` `id1`, `products`.`product`, `us_customers`.`cid` `cid1`, `us_customers`.`phone`, `purchases2`.`price` `price2`, `us_locations`.`id` `id4`, `us_locations`.`city`
                        │  +─    FROM `jamie_costabile`.`stardog`.`purchases`
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases0` ON `purchases`.`id` = `purchases0`.`id`
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`products` ON `purchases0`.`pid` = `products`.`id` AND 'highlighters' = LOWER(`products`.`product`) AND `products`.`product` IS NOT NULL
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`us_customers` ON `us_customers`.`phone` IS NOT NULL
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases1` ON `purchases`.`id` = `purchases1`.`id` AND `us_customers`.`cid` = `purchases1`.`cid`
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`purchases` `purchases2` ON `purchases`.`id` = `purchases2`.`id` AND `purchases2`.`price` > '10000'
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`us_locations` ON `us_locations`.`id` IS NOT NULL AND `us_locations`.`city` IS NOT NULL
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`us_locations` `us_locations0` ON `us_locations`.`id` = `us_locations0`.`id` AND `us_locations0`.`state` = 'Texas'
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`us_customers` `us_customers0` ON `us_customers`.`cid` = `us_customers0`.`cid`
                        │  +─    INNER JOIN `jamie_costabile`.`stardog`.`us_customers` `us_customers1` ON `us_customers`.`cid` = `us_customers1`.`cid`
                        │  +─ Vars=
                        │  +─    ?customer <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:stardog_demo.c360.Customers:{cid/3})
                        │  +─    ?address <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.us_locations:{id/6})
                        │  +─    ?city <- COLUMN($7)^^xsd:string
                        │  +─    ?contactInfo <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:ContactInfo:{cid/3})
                        │  +─    ?phone <- COLUMN($4)^^xsd:string
                        │  +─    ?order <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:Orders:{id/0})
                        │  +─    ?price <- COLUMN($5)^^xsd:double
                        │  +─    ?product <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.products:{id/1})
                        │  +─    ?product_name <- COLUMN($2)^^xsd:string
                        │  } [#222]
                        `─ Union [#180]
                           +─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
                           │  +─ RelNode=
                           │  +─    LogicalProject(id=[$0])
                           │  +─      LogicalFilter(condition=[IS NOT NULL($0)])
                           │  +─        JdbcTableScan(table=[[jamie_costabile, stardog, products]])
                           │  +─ Query=
                           │  +─    SELECT `id`
                           │  +─    FROM `jamie_costabile`.`stardog`.`products`
                           │  +─    WHERE `id` IS NOT NULL
                           │  +─ Vars=
                           │  +─    ?product <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.products:{id/0})
                           │  } [#90]
                           `─ VirtualGraphSql<virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database> [#90] {
                              +─ RelNode=
                              +─    LogicalProject(pid=[$1])
                              +─      LogicalFilter(condition=[IS NOT NULL($1)])
                              +─        JdbcTableScan(table=[[jamie_costabile, stardog, purchases]])
                              +─ Query=
                              +─    SELECT `pid`
                              +─    FROM `jamie_costabile`.`stardog`.`purchases`
                              +─    WHERE `pid` IS NOT NULL
                              +─ Vars=
                              +─    ?product <- TEMPLATE(tag:stardog:designer:TJC-C360-Demo-Update-Azure-DB:data:jamie_costabile.stardog.products:{pid/0})
                              } [#90]

I searched for "service" by copying the text into vim without finding any matches.

Hi Jamie,

That's the text mode I was talking about. Studio also has a graphical mode that can hide some important details. No worries.

What this plan is showing is Stardog has a fairly big plan that has a number of joins with virtual graph queries as arguments. For whatever reason the optimizer opted for HashJoins rather than ServiceJoins, no matter. The main point is that Stardog is executing a plan that includes some plan nodes that it is executing locally and many sub-plans that it is executing on remote database. For this reason, you can't compare the output of the queries that appear on Databricks to the overall SPARQL query. The Databricks queries are for different pieces of the larger plan. Their output is being further operated on on the Stardog side.

There's a couple things you can do at this point. It's not required, but you can encourage Stardog to push more of the plan down to the Databricks side by avoiding what we call virtual transparency, which is what's used when the dataset for the query is a wildcard, like stardog:context:virtual in this case. If you change the dataset from virtual to virtual://TJC-C360-Demo-Update-Azure-DB__data__Databricks-C360-Demo-Database, the only virtual graph that's being referenced in this query, you'll more likely get a single VirtualGraphSql plan node with all the SQL for the query in that one node.

That shouldn't change the output. You should expect to still get no results, but the SQL that shows up in the Databricks query history should also return no results.

From there, figuring out why you're getting no results is an iterative process of breaking your query down into smaller bits and building it back up iteratively and taking note at which point the results are dropped. From there we can usually figure out what's going on.

-Paul

1 Like