Setting for bulk load


(jeong) #1

Hello.

I am trying to upload 5 billion triples using stardog.

Here are some things I did to upload quickly.

  1. The size of the uploaded file was very very big (200GB). So I split it into 300 files.
  2. memory.mode = bulk_load in properties file.
  3. export STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=16g"
  4. Index named-graph OFF in database setting

But uploading speed is more and more slow.

stardog.log

INFO  2018-08-09 15:36:31,874 [Stardog.Executor-64] com.complexible.stardog.index.Index:printInternal(314): Indexing triples: 98% complete in 00:55:28 (2.0K triples/sec)
INFO  2018-08-09 15:39:47,770 [Stardog.Executor-64] com.complexible.stardog.index.Index:printInternal(314): Indexing triples: 99% complete in 00:58:44 (1.9K triples/sec)

This is my server status

Backup Storage Directory : .backup
CPU Load                 : 0.46 %
Connection Timeout       : 10m
Export Storage Directory : .exports
Memory Direct            : 8.4G (Max:  16G)
Memory Direct Buffers    : 8.0G (Max:  15G)
Memory Direct Mapped     : 416M (Max: 156M)
Memory Heap              : 6.1G (Max: 7.8G)
Memory Mode              : BULK_LOAD
Platform Arch            : amd64
Platform OS              : Linux 4.15.0-29-generic, Java 1.8.0_112
Query All Graphs         : false
Query Timeout            : 5m
Security Disabled        : false
Stardog Home             : /hadoop/hdfs/data1/stardog/stardog
Stardog Version          : 5.3.2
Strict Parsing           : false
Uptime                   : 1 day 4 hours 55 minutes 49 seconds
Databases                :
+-----------+---------------+-------------+------------------------------------------------------+--------------------------------------------+
|           |               |             |                     Transactions                     |                  Queries                   |
+-----------+---------------+-------------+-------+-------+-----------+---------------+----------+---------+-------+---------------+----------+
| Database  |     Size      | Connections | Open  | Total | Avg. Size | Avg. Time (s) | Rate/sec | Running | Total | Avg. Time (s) | Rate/sec |
+-----------+---------------+-------------+-------+-------+-----------+---------------+----------+---------+-------+---------------+----------+
| kngi    |   406,256,240 |           0 |     0 |     0 |         0 |         0.000 |    0.000 |       0 |     0 |         0.000 |    0.000 |
| kngi_v1 |   396,260,611 |           0 |     0 |     0 |         0 |         0.000 |    0.000 |       0 |    80 |       398.430 |    0.001 |
| mirae     |   396,260,611 |           0 |     0 |     0 |         0 |         0.000 |    0.000 |       0 |     0 |         0.000 |    0.000 |
| test      |            29 |           0 |     0 |     0 |         0 |         0.000 |    0.000 |       0 |     0 |         0.000 |    0.000 |
| wikidata  | 1,155,150,328 |           0 |     0 |     3 |      6.7M |     6,320.092 |    0.000 |       0 |    15 |       135.261 |    0.000 |
+-----------+---------------+-------------+-------+-------+-----------+---------------+----------+---------+-------+---------------+----------+

Is there any tips for upload speed ?

Too slow to upload for now.


(stephen) #2

Hi there,

According to our documentation, loading compressed data files is actually the preferred way when bulk loading, versus lots of little files. Obviously the more memory you are able to dedicate to Stardog the faster your load will be, but I would first try compressing the 200GB file and loading that directly


(jeong) #3

Hello. stephen.

I tried to upload compressed data files before.
But there was some problem. Maybe log was No space left Error
Space on STARDOG_HOME path is about 2.5TB.

Do I need more available space?

Thank you.


(stephen) #4

That sounds like enough space, unless STARDOG_HOME is on a different volume than the one on which you’re running the Stardog server. Could you get the exact error message from the log?


(zachary.whitley) #5

Could it be running out of space on the tmp directory? I’m not sure if bulk load uses the tmp directory, the documentation only says that temp is used “…for many different operations”.

When you say “upload” do you mean that your 200Gb file is on the same server you’re running Stardog and you’re just trying to load it or do you mean you’re trying to load it from a remote client with --copy-server-side?


(system) #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.