Virtual Graph over Spark SQL

lambdakris · July 19, 2022, 8:17pm

Hello, I am trying to understand whether it is possible to create a Virtual Graph over just plain Spark SQL without Databricks. If yes, would anyone be able to provide some guidance as to how exactly this can be accomplished? Would it be necessary to set up Spark as a Distributed SQL Engine, or does Stardog's Graph Virtualization tech somehow manage to generate and submit a Spark SQL job to the Spark cluster?

Thanks in advance!

Cathy_Dalzell · July 20, 2022, 12:43am

Hello Kris,

I'm sure we can help you out, but we would need more details about where your data live. Strictly speaking, even with Databricks, virtual graphs do not connect Stardog to Spark SQL - they connect Stardog to some kind of persistent storage system - which could be the Databricks Delta Lake, or an Oracle DB, or something else. If I have understood it correctly, Spark is basically a distributed analytics engine that accesses and manipulates data from an underlying distributed system -- sometimes Hadoop, sometimes not.

Stardog is not limited to Databricks by any means. How are you currently storing your data?

Catherine

PaulJackson · July 20, 2022, 12:52pm

Couple things to add in addition to Catherine's reponse:

The interface between Stardog and Spark is a Java JDBC driver, for which you can use Databrick's driver, at least for now. We currently test Spark with version 22 of their driver. You can download it here: https://databricks.com/spark/jdbc-drivers-archive

Aside from that connection, you'll need to create table on the Spark side - see: CREATE TABLE - Spark 3.3.1 Documentation

Once that's done you can create a virtual graph following this: Virtual Graphs | Stardog Documentation Latest

-Paul

Topic		Replies	Views
Stardog Studio - SparQL Query issue against Databricks Bug	5	416	February 15, 2022
Presto connection to StarDog (virtual graph) Support	1	367	June 18, 2021
Create virtual graph from db in MSSQLSERVER Support	2	522	May 31, 2018
Virtual Graph Connection Support	7	821	June 8, 2020
Virtualized Data in Knowledge Graph Support	1	372	June 7, 2021

Virtual Graph over Spark SQL

Related topics