Hello, I am trying to understand whether it is possible to create a Virtual Graph over just plain Spark SQL without Databricks. If yes, would anyone be able to provide some guidance as to how exactly this can be accomplished? Would it be necessary to set up Spark as a Distributed SQL Engine, or does Stardog's Graph Virtualization tech somehow manage to generate and submit a Spark SQL job to the Spark cluster?
Thanks in advance!
Hello Kris,
I'm sure we can help you out, but we would need more details about where your data live. Strictly speaking, even with Databricks, virtual graphs do not connect Stardog to Spark SQL - they connect Stardog to some kind of persistent storage system - which could be the Databricks Delta Lake, or an Oracle DB, or something else. If I have understood it correctly, Spark is basically a distributed analytics engine that accesses and manipulates data from an underlying distributed system -- sometimes Hadoop, sometimes not.
Stardog is not limited to Databricks by any means. How are you currently storing your data?
Catherine
Couple things to add in addition to Catherine's reponse:
The interface between Stardog and Spark is a Java JDBC driver, for which you can use Databrick's driver, at least for now. We currently test Spark with version 22 of their driver. You can download it here: https://databricks.com/spark/jdbc-drivers-archive
Aside from that connection, you'll need to create table on the Spark side - see: CREATE TABLE - Spark 3.3.1 Documentation
Once that's done you can create a virtual graph following this: Virtual Graphs | Stardog Documentation Latest
-Paul