ETL + Change tracking support / best practices

I have two questions regarding data synchronization process.

For now I'm able to export my data info RDF tripples.
Insterting new data into RSF store is quite easy. But how to deal with exisiting data update?

I know it depends on business needs, but maybe you can give some hints. Can we efficiently store all historical data? Is there any support for quering, updating, cleansing such data?

Does Stardog provide some mechanism for change tracking, versioning, data provenance, etc. (I've already know edge properties, but this is only technical stuff).
I wonder if it would be possible to get state at given point of time in the past.
Maybe there is some best/recomended by Stardog way how to deal data update in general.

Second part is about ETL process (or data flow in general).
Is there any recommended scenario how to synchronize data betweem "old fashion" stardard system and RDF store.
I've already found some blogs about Stream Reasoning With Stardog and Stardog Data Flow Automation with NiFi.

Any hints are welcomed :slight_smile:

Hi Pawel,

Welcome to the forum.

Insterting new data into RSF store is quite easy. But how to deal with exisiting data update?

Where is your data coming from? It's not uncommon to have created/updated fields in upstream sources which allow incremental updates over only the records which have changed.

Can we efficiently store all historical data?

Stardog can handle several TB of data on a single node. Once the single node limitation is reached, you will need to partition the data. Stardog does not currently provide any automated horizontal scalability.

Is there any support for quering, updating, cleansing such data?

Of course SPARQL can be used for querying and updating including transformations and cleansing. More sophisticated transformations can be found in purpose built third party cleansing tools.

Does Stardog provide some mechanism for change tracking, versioning, data provenance, etc. (I've already know edge properties, but this is only technical stuff).

This is technical stuff but also can be employed for exactly what you're asking. We don't have a specific tutorial, but feel free ask questions and share your data modeling and we'll be glad to discuss it.

I wonder if it would be possible to get state at given point of time in the past.

This is not currently possible but something we've investigated implementing. There's no current timeline for this feature.

Is there any recommended scenario how to synchronize data betweem "old fashion" stardard system and RDF store.

What type of "old fashioned" system are you talking about here? SQL databases?

You've raised some good questions, but it's difficult to provide specific recommendations without some further details. Stardog facilitates data access without copying all the data using an ETL procedure. If for some reason that's necessary in your case, then something like NiFi would be useful.

Jess

Thanks for welcoming :slight_smile:

Where is your data coming from?
What type of "old fashioned" system are you talking about here? SQL databases?

Our system stores data in relational database and exposes it in format similar to FHIR.
We are trying to build knowledge graph based on that data (and maybe others from different sources in future).
Virtual graph seems to be not an option due to complicated RDB schema.
For now we able to translate our "FHIR" resources into RDF (in a way similar to FHIR Rdf).
The challenge for now is to figure out how to store delta for incoming changes (keep all triplets with timestamps, or replace old with new ones, or move old to separate graph, etc)
and to find good data synchronization scenario (there is a lot of data changed frequently, which should be somehow synchronized in rdf store).
We are at the beginning, so there's so many unknowns :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.