Stardog Studio - SparQL Query issue against Databricks

Muthu · February 14, 2022, 11:09am

Hello,

I have been trying to execute the SparQL queries against the Databricks virtual graph using Stardog Studio, all the queries are failing. At the same time when I run the same queries in the CLI mode using "query execute" then they are all running fine. As per my research the problem is due to the following line getting added at the end by Stardog Studio

"FETCH NEXT 1000 ROWS ONLY"

Has anybody faced this issue? Any workaround / help in this regard?

Regards,
Muthu

Nick_McHugh · February 14, 2022, 2:32pm

Hello Muthu,

Can you please try adding the following setting to the Data tab (connections for Virtual Graphs) in Studio.

Property Value
sql.dialect HIVE

Thanks,
Nick

Muthu · February 14, 2022, 2:54pm

@Nick_McHugh Awesome, it worked. Thanks for your help, really appreciate it !!

On a different note, I have one more question please. Is it mandatory to provide the database (schema) name in the properties file while creating the virtual graph (I'm connecting to Databricks using JDBC)?

I have tables that I want to create the Virtual graph present in more than one database. I'm able to successfully create the Virtual graph only if I give the database name in the properties file, if I exclude it or make it default and then provide the database.tablename syntax in the mapping file, it is failing. Would you be able to help me here as well please?

Also conceptually I want to understand can one Virtual graph be built from multiple different data sources ?

tim · February 14, 2022, 3:29pm

Hi Muthu, what version of Stardog are you on? After Stardog 7.8 it should no longer be necessary to provide the sql.dialect HIVE property.

To use tables from multiple databases in Databricks you can set the sql.schemas property with multiple databases separated by a comma (sql.schemas=default,sampledb,anothersampledb). If you auto-generate the Virtual Graph mappings files, you should notice the SQL query generated with the database name. From here you can modify the SQL queries and mappings.

To create Virtual Graphs from another data source, this will require a new data source connection and it's own Virtual Graph mapping. If both Virtual Graphs are connected to the same database then the end users should be able to query across data sources.

Muthu · February 14, 2022, 4:51pm

@tim Thanks for your reply. I'm using Stardog server version 7.8.2 and Studio version 3.2.0.

Regarding the Schemas, if I use the sql.schemas= option in the properties file and try to use the R2RML mapping file with tables from multiple databases, it is not able to find those tables, it is still searching in the same default database in the connection string.

jdbc.url=jdbc:spark://xxxxx/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/zzzz/zzzz;AuthMech=3;UID=token;PWD=<personal-access-token>
jdbc.username=token
jdbc.password=yyyyy
jdbc.driver=com.simba.spark.jdbc.Driver
sql.schemas=default, schema1, schema2

prefix  : <http://example.com/kg/pos/>
prefix rr: <http://www.w3.org/ns/r2rml#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:BMapping
    a rr:TriplesMap ;
	rr:subjectMap [
		rr:template "http://example.com/kg/pos/{\"business\"}" ;
		rr:class :Business
	] ;
	rr:predicateObjectMap [
		rr:predicate rdfs:label;
		rr:objectMap [ rr:column ' \"business\" ' ]
	] ;
	rr:logicalTable [ rr:sqlQuery """SELECT distinct \"business\" FROM \"table1\" WHERE \"country_name\" = 'UNITED STATES' """ ] .
	
:SMapping
    a rr:TriplesMap ;
	rr:subjectMap [
		rr:template "http://example.com/kg/pos/{\"segment\"}" ;
		rr:class :Segment
	] ;
	rr:predicateObjectMap [
		rr:predicate rdfs:label;
		rr:objectMap [ rr:column ' \"segment\" ' ]
	] ;
	rr:logicalTable [ rr:sqlQuery """SELECT distinct \"segment\" FROM \"table2\" WHERE \"country_name\" = 'UNITED STATES' """ ] .

Where, table1 is from Schema1 and table2 is from Schema2

PaulJackson · February 15, 2022, 12:51am

Hi Muthu,

There can be only one default schema, in this case it is the one in the connection string (default). For tables in the other schemas you'll need to qualify them (Schema1.table1).

-Paul

Topic		Replies	Views
Virtual Graph over Spark SQL Support	27	531	May 12, 2025
VirtualGraph query error Support	8	449	September 23, 2021
Multiple data sources to a virtual graph Support	12	528	August 19, 2021
Create virtual graph from db in MSSQLSERVER Support	2	522	May 31, 2018
Querying virtual graph in Stardog Studio Stardog Studio	2	287	May 19, 2023

Stardog Studio - SparQL Query issue against Databricks

Related topics