Stardog Studio - SparQL Query issue against Databricks

Hello,

I have been trying to execute the SparQL queries against the Databricks virtual graph using Stardog Studio, all the queries are failing. At the same time when I run the same queries in the CLI mode using "query execute" then they are all running fine. As per my research the problem is due to the following line getting added at the end by Stardog Studio

"FETCH NEXT 1000 ROWS ONLY"

Has anybody faced this issue? Any workaround / help in this regard?

Regards,
Muthu

Hello Muthu,

Can you please try adding the following setting to the Data tab (connections for Virtual Graphs) in Studio.

Property Value
sql.dialect HIVE

Thanks,
Nick

1 Like

@Nick_McHugh Awesome, it worked. Thanks for your help, really appreciate it !!

On a different note, I have one more question please. Is it mandatory to provide the database (schema) name in the properties file while creating the virtual graph (I'm connecting to Databricks using JDBC)?

I have tables that I want to create the Virtual graph present in more than one database. I'm able to successfully create the Virtual graph only if I give the database name in the properties file, if I exclude it or make it default and then provide the database.tablename syntax in the mapping file, it is failing. Would you be able to help me here as well please?

Also conceptually I want to understand can one Virtual graph be built from multiple different data sources ?

Hi Muthu, what version of Stardog are you on? After Stardog 7.8 it should no longer be necessary to provide the sql.dialect HIVE property.

To use tables from multiple databases in Databricks you can set the sql.schemas property with multiple databases separated by a comma (sql.schemas=default,sampledb,anothersampledb). If you auto-generate the Virtual Graph mappings files, you should notice the SQL query generated with the database name. From here you can modify the SQL queries and mappings.

To create Virtual Graphs from another data source, this will require a new data source connection and it's own Virtual Graph mapping. If both Virtual Graphs are connected to the same database then the end users should be able to query across data sources.

@tim Thanks for your reply. I'm using Stardog server version 7.8.2 and Studio version 3.2.0.

Regarding the Schemas, if I use the sql.schemas= option in the properties file and try to use the R2RML mapping file with tables from multiple databases, it is not able to find those tables, it is still searching in the same default database in the connection string.

jdbc.url=jdbc:spark://xxxxx/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/zzzz/zzzz;AuthMech=3;UID=token;PWD=<personal-access-token>
jdbc.username=token
jdbc.password=yyyyy
jdbc.driver=com.simba.spark.jdbc.Driver
sql.schemas=default, schema1, schema2
prefix  : <http://example.com/kg/pos/>
prefix rr: <http://www.w3.org/ns/r2rml#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:BMapping
    a rr:TriplesMap ;
	rr:subjectMap [
		rr:template "http://example.com/kg/pos/{\"business\"}" ;
		rr:class :Business
	] ;
	rr:predicateObjectMap [
		rr:predicate rdfs:label;
		rr:objectMap [ rr:column ' \"business\" ' ]
	] ;
	rr:logicalTable [ rr:sqlQuery """SELECT distinct \"business\" FROM \"table1\" WHERE \"country_name\" = 'UNITED STATES' """ ] .
	
:SMapping
    a rr:TriplesMap ;
	rr:subjectMap [
		rr:template "http://example.com/kg/pos/{\"segment\"}" ;
		rr:class :Segment
	] ;
	rr:predicateObjectMap [
		rr:predicate rdfs:label;
		rr:objectMap [ rr:column ' \"segment\" ' ]
	] ;
	rr:logicalTable [ rr:sqlQuery """SELECT distinct \"segment\" FROM \"table2\" WHERE \"country_name\" = 'UNITED STATES' """ ] .

Where, table1 is from Schema1 and table2 is from Schema2

Hi Muthu,

There can be only one default schema, in this case it is the one in the connection string (default). For tables in the other schemas you'll need to qualify them (Schema1.table1).

-Paul