Loading data Coming from Kafka

Maatari · August 30, 2017, 8:12am

We are doing some huge data conversion. The data goes into Kafka and then several consumer including, one for loading the data into stardog. In this scenario, bulk loading is not available, as the data is in kafka. This is a streaming job in a sense. Hence my question is: Do you have any advise on how to do that ?

I am using Sparql update over Http, and parallelizing calls, still it is not that fast. My second solution would be do try to buffer and send bigger packet at once. I am using INSERT DATA, so I guess instead of sending 1 record at the time per connection, I could send multiple record in the INSERT DATA.

I wonder if you have any suggestion for the matter. I’d like to make that process as fast as possible.

Astn · August 30, 2017, 5:32pm

We are also loading data from Kafka to Stardog and have run into similar issues. Batching has helped quite a bit over having a 1 to 1 mapping of kafka message to stardog insert.

The approach we are just embarking on this week is to adaptively commit based off of how far we are from the head of the Kafka topic. So something like:

partitionLag = Paritions.map( partition.maxOffset - partition.currentOffset)
commitAfterMessages = partitionLag.map( Math.Max(partition.Lag / 10, 100000) )

The idea being bigger batches have a higher throughput, but when we are at the head of the topic we prefer small batches, or possibly no batching.

Maatari · August 30, 2017, 5:34pm

I see smart. I was thinking about batching indeed, but then there is more to it. thank you for the tip

system · September 13, 2017, 5:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to bulk load data into stardog using python? Support	12	3130	August 24, 2018
Prevent deteriorating load speed and swapping during bulk load Support	7	832	March 22, 2018
Stardog add is too slow (loading split files) Support	3	376	October 23, 2020
Load data files in parallel Bug	2	496	April 1, 2021
Setting for bulk load Support	5	1152	August 23, 2018

Loading data Coming from Kafka

Related topics