How to overcome Table and Column Names getting converted to caps from mapping file while creating Virtual Graph?

Hi Everyone,

I'm new to Stardog, while I was trying the Virtual Graph Mapping using the tables in Databricks (using JDBC driver) I'm facing error due to Column and Table names by default getting converted to caps. Somehow for Table name I found a way to use the Quotation with escape character, so I arrived at the below simple mapping file

PREFIX : http://stardog.com/tutorial/
PREFIX sm: tag:stardog:api:mapping:

:Album{id} a :Album ;
:name "{name}" ;
:date "{release_date}" ;
:artist :Artist{artist} ;
sm:map [ sm:table ' \"album\" ' ] .

Then the next problem started, the same capital conversion is happening for Column names as well. I'm giving exactly the same way it is in the Databricks table in my mapping file but it's throwing the below error

In order to confirm it is because of the Caps conversion only, I created a table with all the column names in Caps and then tried the mapping file, it worked perfectly fine, but not a feasible solution. I also tried with various combinations of Quotation with escape character and without escape character could not find the working one.

Would really appreciate if anybody could help me over here

Hi Muthu,

Welcome to the community and thank you for the issue report.

I've heard of this problem but have been unable to reproduce it on my end. Could you give me any more details? How are you creating the table on the Databricks side?

I do:

CREATE TABLE `Sports` USING csv OPTIONS (path "/FileStore/student_sport.csv", header "true");

show tables;

... and in the Databricks client it shows the new table as all lower-case and it makes no different how I quote or case the table names in my queries, it always just works.

What cloud platform are you using? I'm currently testing on Azure. What version of Stardog and what version of the JDBC driver? Can you share your data source properties file (masking anything sensitive)?

Thanks,
-Paul

Thanks for your reply Paul.

We are using AWS cloud environment.
JDBC driver version - 2.6.21
Stardog version - 7.8.2

Just a quick pointer, the actual problem is with the Column names, table name as you are saying is always getting created as lowercase in Databricks, I'm able to overcome the table name getting converted into CAPS by using the quotations with escape characters but column names are the main issue here

Below is the query I used to create the table with all the column names in CAPS. Since in this table all the column names are in caps the mapping file is working fine, but unfortunately we cannot expect all the existing table's column names to be in CAPS, so if try to run for a table with column name in lowercase it failing with the error saying column not found because whatever column I'm mentioning in my mapping file in lowercase are getting looked up in uppercase in databricks, where and why it is getting looked up in uppercase, while I'm giving it in lowercase in mapping file ?

CREATE TABLE db.ALBUM ( ID INT, NAME VARCHAR(30), RELEASE_DATE DATE, ARTIST INT );

Below is the properties file

jdbc.url=jdbc:spark://<databricks connection string>
jdbc.username=<xxxx>
jdbc.password=<yyyy>
jdbc.driver=com.simba.spark.jdbc.Driver
parser.sql.quoting=ANSI <== Tried with as well as without this property

Hi Muthu,

I set up an AWS-hosted Databricks workspace and was able to reproduce your problem with versions 7.8.2 and 7.8.3. The problem is that our connector is requiring exact casing but it should not when connecting to Databricks. This will be fixed in version 7.9.

I was able to create working mappings with 7.8.2 by quoting the fields. Based on my tests,
I think your problem must have to do with the quoting.

I'll share all my settings and test results.

Here's the databricks sql, run from the web console:

CREATE TABLE `Sports`(`Student` STRING, `Sport` STRING);
INSERT INTO `Sports` VALUES ('Venus', 'Tennis');
INSERT INTO `Sports` VALUES ('SMITH', 'Hackeysack');
ANALYZE TABLE `Sports` COMPUTE STATISTICS;

show tables; -- sports
DESC EXTENDED sports; -- Student string, Sport string

-- All these return with Student and Sport column names
select * from sports;
select * from Sports;
select * from `Sports`;
select * from `SpOrTs`;

select Student  from sports; -- returns Student
select `Student`from sports; -- returns Student
select student  from sports; -- returns student
select STUDENT  from sports; -- returns STUDENT
select `student`from sports; -- returns student
select `STUDENT`from sports; -- returns STUDENT

Here's my properties file:

jdbc.url=jdbc:spark://dbc-cedc1234-0abcd.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/a12b34c56d78e90;
jdbc.username=token
jdbc.password=dapi99999999999999999999999999
jdbc.driver=com.simba.spark.jdbc.Driver
parser.sql.quoting=ANSI
testOnBorrow=true
validationQuery=SELECT 1

These mappings would produce an error:

$ cat sports.sms2 
PREFIX ex: <http://example.com/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rr: <http://www.w3.org/ns/r2rml#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

MAPPING
FROM SQL {
  SELECT "SpOrT"
  FROM "default"."SpOrTs"
}
TO {
  ?subject rdf:type ex:Sport .
  ?subject rdfs:label ?SpOrT
} WHERE {
  BIND(template("http://example.com/{SpOrT}") AS ?subject)
}

;

MAPPING
FROM SQL {
  SELECT *
  FROM "default"."SpOrTs"
}
TO {
  ?subject ex:plays ?plays .
  ?subject rdf:type ex:Student .
  ?subject foaf:name ?Student
} WHERE {
  BIND(template("http://example.com/{Sport}") AS ?plays)
  BIND(template("http://example.com/{Student}") AS ?subject)
}

Error:

$ sa virtual add -v -o -f sms2 databricks_aws.properties sports.sms2 
From line 10, column 8 to line 10, column 25: Object 'SpOrTs' not found within 'default'; did you mean 'sports'?
The detailed stack trace for the error is:
com.complexible.stardog.protocols.http.client.BaseHttpClient$HttpClientException: com.complexible.stardog.virtual.vega.UncheckedSqlParseException: org.apache.calcite.runtime.CalciteContextException: From line 10, column 8 to line 10, column 25: Object 'SpOrTs' not found within 'default'; did you mean 'sports'?

These mappings worked:

$ cat sports.sms2 
PREFIX ex: <http://example.com/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rr: <http://www.w3.org/ns/r2rml#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

MAPPING
FROM SQL {
  SELECT "Sport"
  FROM "default"."sports"
}
TO {
  ?subject rdf:type ex:Sport .
  ?subject rdfs:label ?Sport
} WHERE {
  BIND(template("http://example.com/{Sport}") AS ?subject)
}

;

MAPPING
FROM SQL {
  SELECT *
  FROM "default"."sports"
}
TO {
  ?subject ex:plays ?plays .
  ?subject rdf:type ex:Student .
  ?subject foaf:name ?Student
} WHERE {
  BIND(template("http://example.com/{Sport}") AS ?plays)
  BIND(template("http://example.com/{Student}") AS ?subject)
}

I create the VG with this CLI:

$ stardog-admin virtual add -v -o -f sms2 databricks_aws.properties sports.sms2
Successfully added virtual graph databricks_aws

$ stardog query foo 'select * from <virtual://databricks_aws> {?s?p?o}'
+-------------------------------+--------------------------------+-------------------------------+
|               s               |               p                |               o               |
+-------------------------------+--------------------------------+-------------------------------+
| http://example.com/Hackeysack | rdf:type                       | http://example.com/Sport      |
| http://example.com/Hackeysack | rdfs:label                     | "Hackeysack"                  |
| http://example.com/Tennis     | rdf:type                       | http://example.com/Sport      |
| http://example.com/Tennis     | rdfs:label                     | "Tennis"                      |
| http://example.com/SMITH      | http://example.com/plays       | http://example.com/Hackeysack |
| http://example.com/SMITH      | rdf:type                       | http://example.com/Student    |
| http://example.com/SMITH      | http://xmlns.com/foaf/0.1/name | "SMITH"                       |
| http://example.com/Venus      | http://example.com/plays       | http://example.com/Tennis     |
| http://example.com/Venus      | rdf:type                       | http://example.com/Student    |
| http://example.com/Venus      | http://xmlns.com/foaf/0.1/name | "Venus"                       |
+-------------------------------+--------------------------------+-------------------------------+

Query returned 10 results in 00:00:05.087

Here's mapping with the older stardog mappings format. These produced an error:

$ cat sports.ttl
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sms: <tag:stardog:api:mapping:> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
ex:{"Sport"}
   rdf:type ex:Sport ;
   <http://www.w3.org/2000/01/rdf-schema#label> "{\"SpOrT\"}"^^xsd:string ;
   sms:map [
      sms:query """SELECT \"Sport\"
FROM \"default\".\"sports\"""" ;
   ] .

ex:{"Student"}
   foaf:name "{\"Student\"}"^^xsd:string ;
   rdf:type ex:Student ;
   ex:plays ex:{"SpOrT"} ;
   sms:map [
      sms:table "\"default\".\"sports\"" ;
   ] .

Error:

Invalid mapping for triples map with subject=TEMPLATE(http://example.com/{"Sport"})
	resource=_:bnode_86e58971_174e_45e2_8e63_a351deef87a0_152
	table=SELECT "Sport" FROM "default"."sports"
	Error=field [SpOrT] not found; input fields are: [Sport]
	term map=COLUMN("SpOrT")

These worked:

cat sports.ttl
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sms: <tag:stardog:api:mapping:> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
ex:{"Student"}
   ex:plays ex:{"Sport"} ;
   foaf:name "{\"Student\"}"^^xsd:string ;
   rdf:type ex:Student ;
   sms:map [
      sms:table "\"default\".\"sports\"" ;
   ] .

ex:{"Sport"}
   <http://www.w3.org/2000/01/rdf-schema#label> "{\"Sport\"}"^^xsd:string ;
   rdf:type ex:Sport ;
   sms:map [
      sms:query """SELECT \"Sport\"
FROM \"default\".\"sports\"""" ;
   ] .
$ stardog-admin virtual add -v -o -f stardog databricks_aws.properties sports.ttl 
Successfully added virtual graph databricks_aws

$ stardog query foo 'select * from <virtual://databricks_aws> {?s?p?o}'
+-------------------------------+--------------------------------+-------------------------------+
|               s               |               p                |               o               |
+-------------------------------+--------------------------------+-------------------------------+
| http://example.com/Hackeysack | rdf:type                       | http://example.com/Sport      |
| http://example.com/Hackeysack | rdfs:label                     | "Hackeysack"                  |
| http://example.com/Tennis     | rdf:type                       | http://example.com/Sport      |
| http://example.com/Tennis     | rdfs:label                     | "Tennis"                      |
| http://example.com/SMITH      | http://example.com/plays       | http://example.com/Hackeysack |
| http://example.com/SMITH      | rdf:type                       | http://example.com/Student    |
| http://example.com/SMITH      | http://xmlns.com/foaf/0.1/name | "SMITH"                       |
| http://example.com/Venus      | http://example.com/plays       | http://example.com/Tennis     |
| http://example.com/Venus      | rdf:type                       | http://example.com/Student    |
| http://example.com/Venus      | http://xmlns.com/foaf/0.1/name | "Venus"                       |
+-------------------------------+--------------------------------+-------------------------------+

Query returned 10 results in 00:00:02.860

-Paul

1 Like

Thanks a lot Paul, really appreciate your detailed analysis and explanation. For now this workaround is working like a Gem. Thank You!!!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.