Is there any "how to" info for VKGs with hybridized SQL/JSON content

Most of the major relational engines seem to have at least some support for fields with JSON data types.

Is it possible (or practical) to use RDM Integrated Mapping as a tool to handle such fields?

On a related note, is SMS2 capable of this sort of "mode-switching" for mapping .. i.e. ingesting a SQL record, and dispatching any JSON fields within that record for further mapping?

It looks like SMS2 allows multiple MAPPING blocks .. I guess this question about SMS2 boils down to whether a field value obtained from one FROM clause can be somehow passed to another FROM clause .. in this specific case, somehow passing JSON content associated with a FROM SQL clause to a suitable FROM JSON clause.

Actually, after a little thinking on this, the question may contain the seeds of its own answer.

For R2RML, I think it boils down to whether the SQL is passed through to the target system "intact" .. i.e. whether the query rewrite layer is able to accommodate the inclusion "custom" constructs, like JSON functions and operators, without mangling them.

I'll experiment with when I can, and will post any results.

I was able to experiment a little with including JSONB in R2RML triples map queries .. and it worked .. which isn't surprising.

Some really simple examples below, in case they are of interest.

The good news .. it works.
The less-good news .. it's inherently difficult to wrangle deep tree-structured objects this way. In the end, the query needs to produce records that are essentially rows with uniformly typed columns.

In any case, the first simple example was a triple for a type called "MeasureParameter" looked like this:

:Item{id} a :MeasureParameter ; :units {units} ; :value {value} ; :paramType {paramType} .

And the corresponding SQL query looked like this:

from (
            cast (content ->> 'units' as varchar) as units,
            cast (content ->> 'value' as float) as value,
            cast (content ->> 'paramType' as varchar) as paramType
        FROM public."LibraryItem"
            type = 'Parameter'
    ) as MeasureParameters
where units is not null

So, basically, fields named units, type, and paramType were hauled out of a JSONB block and converted into column values,

Interestingly, the query would not work in postgres w/o the little where units is not null clause .. which was ok, because it's a type guard that distinguishes 'MeasureParameter' records from other kinds of parmeter records.

I tested this using the following simple SPARQL query for 'MeasureParameter' records worked as expected.

SELECT ?s ?units ?value ?paramType WHERE {
  ?s rdf:type :MeasureParameter;
    :units ?units;
    :value ?value;
    :paramType ?paramType.

One quirk was that I had to enable Ontop's data inference capabilities: ontop.inferDefaultDatatype=true .. so using non-standard ways to generate SQL rows (in this case, JSONB operations) seems like it raises some risk of introducing minor quirks.

FWIW, here is an example of using the postgres jsonb_array_elements function to "expand" each record in a table by creating a row for each item in a contained JSONB array. In this example, the "LibraryItem" table has a JSONB column named content, and that content's value, in many cases, includes an items field, The items field is an array of structures, and those structures have a name, type, and instanceId field (and several other field besides .. it's a deep tree structure).

In any case, the following query:

    (items ->> 'name')::text AS name,
    (items ->> 'type')::text AS type,
    (items ->> 'instanceId')::text AS instanceId
    jsonb_array_elements(content->'items') as items

Produces a result like this:

id name type instanceid
0284f6ae-9a51-40aa-bd6d-9551a56d4d70 Load Element 7637713a-995f-4c1c-b30b-c19e476de8d9
0284f6ae-9a51-40aa-bd6d-9551a56d4d70 Wash Element 0d2073e8-2f22-4887-9e13-b90697c6091e
0284f6ae-9a51-40aa-bd6d-9551a56d4d70 Elute Element 91acc736-52b7-4014-9170-b6ebbbbea6fe
8e3ec7de-1e99-4aef-9a85-2bb18d3be211 Temperature Parameter b0558eb5-0171-4482-b4c0-6d01df84846a
8e3ec7de-1e99-4aef-9a85-2bb18d3be211 SpecificFuelConsumption Parameter f9c63e0d-3f8c-492d-b245-8f0d62ac1e1a
8e3ec7de-1e99-4aef-9a85-2bb18d3be211 RotationalSpeed Parameter e1dabe66-1446-441d-906f-16cf24201245
8e3ec7de-1e99-4aef-9a85-2bb18d3be211 RotationalSpeed Parameter 8fef1191-9f5b-4eac-9c7e-577c5b8e19dc
f5b66049-7519-418d-97f3-e8eba599da02 AddLiquid Element 3f58b9dc-628f-4732-be6b-dab8166ff824

Just tried this experiment in a Stardog VKG, and it seems that SD does not delegate the SQL directly to the SQL server .. which may be part of the reason that SD outperforms Ontop pretty handily.

I tried rewriting this with the JSON functions described in the SQL 2016 standard, and could not get it to work, but the picture was a little murkier.

This snippet of SMS:

MAPPING <urn:test_json>
            '{"some_field": "this is some field", "some_other_field": "this is some other field"}',
            'lax $.some_field'
            RETURNING varchar
            ) AS some_field
TO {
  ?subject <> ?some_field.
  ?subject rdf:type :json_example
  BIND(template("{?some_field}") AS ?subject)

Produced the following error:

Failed to update Virtual Graph "stellarforge_local_postgres_vkg": 000IA2: Template must reference one or more variables:{?some_field} (Bad Request)

The murkiness is that the following, much simpler query yields the same error .. so it may well be that whatever SD is using to wrangle SQL is fully supportive of JSON functions .. but simply doesn't like SELECT statements that return just return one row ..

MAPPING <urn:test_json>
   '{"some_field": "this is some field", "some_other_field": "this is some other field"}' AS some_field
TO {
  ?subject <> ?some_field.
  ?subject rdf:type :json_example
  BIND(template("{?some_field}") AS ?subject)

Some further experimentation suggests that the SQL JSON stuff might work as long as the target system supports the SQL JSON standard.

I created the following MAPPING "block" in SMS, and the system accepted it:

MAPPING <urn:try_json>
      CAST(content AS VARCHAR),
      'lax $'
      ) as paramType
    where type = 'Parameter'

TO {
  ?subject <> ?paramType .
  ?subject rdf:type :LibraryParameter
  BIND(template("{id}") AS ?subject)

And then ran the following SPARQL:

PREFIX sd_api: <>
PREFIX owl: <>
PREFIX rdf: <>
PREFIX rdfs: <>
PREFIX so: <>
PREFIX stardog: <tag:stardog:api:>
PREFIX xsd: <>

Select  ?s ?paramType
    ?s a :LibraryParameter; <> ?paramType.

Which returned the error:

Failed to run query: com.complexible.stardog.plan.eval.operator.OperatorException: Unable to execute virtual graph query. SQL string: SELECT "id", JSON_QUERY(CAST("content" AS VARCHAR), 'lax $.paramType' WITHOUT ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) AS "paramtype"
FROM "stellarforge"."public"."LibraryItem"

The elaborate SQL suggests that the system actually understands the SQL/JSON syntax, and was able to generate an appropriate query. And I suspect that the query would have worked if the target system understood it. However, the target system is Postgres .. so it uses a non-standard syntax for JSON queries.

If the target system had been, say, a MySql server, I think that the query would have worked.

In any case, relational data sources with JSON content seem to be increasingly common .. so I think that this might be a feature that a lot of current and potential customers might appreciate .. and it seems like Stardog might already have many of the pieces in place for a fairly general implementation.

From the cheap seats, something that can cascade, say a FROM SQL clause to a FROM JSON clause seems like it might offer a technique that is relatively insensitive to JSON implementations in various SQL servers.

An alternative (not necessarily exclusive to the "cascaded FROM clause" approach) may be to have an option that allows SQL statements to be "escaped" such that they are passed to the server unaltered.

1 Like