Value Translation capabilities as part of SMS

It is often the case that data sources may not contain all the common identifiers needed to create all the URIs needed to link data together. In many cases the solution is easy, such as adding an extra column to a database table with the identifiers needed (or an entire URI).

However, this is not always possible. It is not always possible to augment the data sources to use or include additional common identifiers. In these cases, creating connected data across sources can be challenging.

The D2RQ virtualization tool had a concept called a Translation Table in which an additional layer is added between the data source and mapping where data source values are translated to another value (and vice versa). This would enable, for example, a mapping between identifiers in one system to globally used URIs to create connections across data sources.

I don't see any capability for that in SMS mappings, and I know R2RML doesn't have that capability either. However, I run into this problem often and would find it useful if Stardog supported a similar feature.

Hi Matt,

Thank you for submitting your suggestion. I created a ticket for this in our tracking system (PLAT-5709) for our product team to review.

In the meantime a couple things you can consider as a potential work around are:

  • CASE statement in FROM SQL mappings clause - see example here.

  • Saving your translation table in a Stardog graph, then incorporating this graph into your virtual graph queries. Something like :Mapping1 :hasInput "CLERK"; :hasOutput "general-office".

The latter wouldn't work if your goal for this transformation is to create an IRI that matches the IRI in another VG. If this is the goal then maybe instead of mapping these literals you create a VG mapping that creates IRIs with a different namespace and could create a Stardog graph that maps the IRIs to one another. If such an idea is workable (finite/known set of IRIs), this may perform best because it keeps the SQL mapping simple. See Optimization | Stardog Documentation Latest.

I understand that none of these achieve exactly what you're asking for. It's just some food for thought.

-Paul

CASE statements do the trick functionally, but if there are hundreds of values that need to be mapped in, that can get unwieldy.

Furthermore, how well does the CASE statement work from a performance perspective? If you provide a value in a query triple pattern that should match the output of a CASE statement, is there any optimization done on the Stardog side to improve performance where possible, or is a column scan and CASE statement always required?

On a related note, SMS/SMS2 does not support R2RML's Inverse Expressions capability, correct? That would partially offset the above issue.

I agree, Matt. CASE statements are viable only with small translation tables, and chances are that a CASE statement would lead to a table scan if the SPARQL query was searching on the output of the CASE. I suppose if the CASE was a 1:1 bijection, then a sophisticated RDBMS might reverse the function and convert the table scan to an index lookup, but we'd have to do testing to confirm.

We also do not support R2RML's inverse expressions, but that could be another potential solution.