We are doing some huge data conversion. The data goes into Kafka and then several consumer including, one for loading the data into stardog. In this scenario, bulk loading is not available, as the data is in kafka. This is a streaming job in a sense. Hence my question is: Do you have any advise on how to do that ?
I am using Sparql update over Http, and parallelizing calls, still it is not that fast. My second solution would be do try to buffer and send bigger packet at once. I am using INSERT DATA, so I guess instead of sending 1 record at the time per connection, I could send multiple record in the INSERT DATA.
I wonder if you have any suggestion for the matter. Iād like to make that process as fast as possible.
We are also loading data from Kafka to Stardog and have run into similar issues. Batching has helped quite a bit over having a 1 to 1 mapping of kafka message to stardog insert.
The approach we are just embarking on this week is to adaptively commit based off of how far we are from the head of the Kafka topic. So something like: