I did a quick test to see if I could configure a JDBC connection from Trino to Stardog's virtual MySql server. FWIW, I have a little section at the end explaining why we were interested in trying this ..
In any case, it seemed like it almost worked .. The gist is that "structural" queries that operated "above" the table level seemed to work fine .. examples would be statements like show catalogs;
, show schemas from stardog_bi;
or show tables from stardog_bi.local_vkgs;
.
But calls in the Trino CLI that involved access to individual tables, such as show columns from stardog_bi.starburst_prod_db.coupa_invoices_headers;
, or select * from stardog_bi.starburst_prod_db.coupa_invoices_headers;
would give errors like:
Query 20230819_063032_00013_c2rfn failed: line 1:1: Table 'stardog_bi.starburst_prod_db.coupa_invoices_headers' does not exist
show columns from stardog_bi.starburst_prod_db.coupa_invoices_headers
With the key portion of that error being Table 'x.y.z' does not exist
. That makes me wonder what Trino is "asking" of the BI Server, and why its response is interpreted as "non-existence" of a table that clearly does exist.
Possibly related? .. When I try to access the BI server using Sequel Pro, it tends to throw mysterious errors. These queries generally recover after a while, and they return appropriate response .. so they are sort of "benign" errors. This might be completely unrelated to my Trino issues .. but since I don't have any clues, it's tempting to wonder if this other quirk might help me understand what is going on.
The failed Trino calls, and the "mysterious errors" in Sequel Pro do not leave any fingerprints in stardog.log. So I guess I'm looking for any simple ideas on how to "spy" a little closer on interaction between the virtual MySql server, and client applications.
I did find The MySQL Test Framework, and I suppose I could even bust out Wireshark and try to spy on the MySql protocol itself, but I guess I'm hoping for some wisdom that might save me a lengthy and/or fruitless chase.
Why wrap a perfectly good MySql server in Trino?
Accessing the Stardog BI server via Trino (which is a SQL federation layer) might seem like a needless complication, but the idea is that it:
- Allows us to include the BI layer as part of a larger catalog. The simple act of federating many of our most-used relational resources into large, "one-stop-shopping" catalogs makes most aspects of implementation of a FAIR data philosophy significantly easier.
- Allows us to use Privacera (which has a Trino plug-in) for creating and managing access policies. It's hard to overstate how much effort this saves, but it basically:
- Saves us from needing to grapple with the massive combinatorial space of "every kind of user" x "every sort of access control system" x "on all resources".
- Allows us to skirt the mind-bendy problem of "access control policies on a graph" and instead use a more tractable model of "access control on a relational resource" for most "ordinary" users. Since ordinary internal customers tend to be "flat-earthers", it's a nice fit .. access control for the "less fancy" customers can be managed with less effort, which frees up time to deal with any fancy needs from fancier internal customers.
- Allows us to do "ambient" joins between disparate resources. In many cases, it's possible to "find" (or compute) keys or super-keys that can join sources that were not necessarily explicitly designed to be joined. An example might be, say, a join between a Quality Event record from Veeva QMS, and, say, a fragment of Process Control history from one of our many Process History resources, using, say, "Batch ID" as a "candidate key"