Uploading freebase: memory config

pierluigi · December 1, 2017, 12:06am

I have been trying to upload the Freebase data (triples -- the legacy data found here: Data Dumps | Freebase API (Deprecated) | Google Developers).

I set memory.mode=bulk_upload in stardog.properties, max heap is 16G and max direct memory is 40G, on a machine with 52G total and 8 cores. Disk is 0.5T. Linux (ubuntu 17.04).

The data is loading currently at about 100K triples/sec., which (absolute value aside) is not any faster than when I tried with memory.mode=default (or rather, not set). This is on a new-db creation and strict.parsing=false. Is there an expectation that bulk_load should run faster? Do you have any suggestion to improve loading speed significantly?

Thank you

stephen · December 1, 2017, 1:28pm

Hi,

Try setting memory.mode=bulk_load instead of “bulk_upload.” With “bulk_upload,” since it doesn’t match one of the predefined settings, it simply stays at default.

You can confirm the setting is correct on server start. In the log you should see Memory mode: BULK_LOAD. Chances are your log may have a message about ignoring “bulk_upload” and using DEFAULT instead.

pierluigi · December 1, 2017, 8:07pm

Thanks! Miraculously I had set it correctly, just misspelled it in the message above. It worked definitely better in the end: it took 8 hours to parse the triples:

INFO 2017-12-01 06:54:05,304 [XNIO-1 task-9] com.complexible.stardog.StardogKernel:printInternal(314): Parsing triples: 100% complete in
08:03:40 (3130.8M triples - 107.9K triples/sec)

But the parsing rate was steady (slightly increasing toward the end). Indexing and stats took much less. What’s the most direct way to improve the data load performance, number of CPUs?

pavel · December 1, 2017, 8:21pm

You can try to load from multiple files (i.e. split the input).

Cheers,
Pavel

system · December 15, 2017, 8:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting for bulk load Support	5	1159	August 23, 2018
Prevent deteriorating load speed and swapping during bulk load Support	7	833	March 22, 2018
Slow data loading even in bulk load settings Support	18	894	October 22, 2019
Performance deteriorates while loading large dataset Support	4	467	September 11, 2018
Out of memory while creating a database with Trig file from commandline Support	5	578	November 14, 2017

Uploading freebase: memory config

Related topics