How to disable statistics in stardog 7.0.1

Hi,

I got a problem when bulk loading 1 ttl file (2.1G) to Stardog, when I checked log, it stucks at computing statistics phase

INFO  2019-10-02 06:18:21,752 [stardog-user-1] com.complexible.stardog.index.Index:printInternal(314): Indexing triples: 100% complete in 00:01:51 (907.4K triples/sec)
INFO  2019-10-02 06:18:21,753 [stardog-user-1] com.complexible.stardog.index.Index:stop(326):
INFO  2019-10-02 06:18:21,753 [stardog-user-1] com.complexible.stardog.index.Index:stop(329): Indexing triples finished in 00:01:51.182
INFO  2019-10-02 06:18:30,406 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 3% complete in 00:00:08 (355.5K triples/sec)
INFO  2019-10-02 06:18:34,395 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 4% complete in 00:00:12 (322.8K triples/sec)
INFO  2019-10-02 06:18:36,818 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 5% complete in 00:00:14 (338.0K triples/sec)
INFO  2019-10-02 06:18:39,070 [Stardog.Executor-116] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 6% complete in 00:00:17 (352.4K triples/sec)
INFO  2019-10-02 06:18:41,232 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 7% complete in 00:00:19 (365.2K triples/sec)
INFO  2019-10-02 06:18:43,110 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 8% complete in 00:00:21 (380.4K triples/sec)
INFO  2019-10-02 06:18:44,972 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 9% complete in 00:00:23 (393.4K triples/sec)
INFO  2019-10-02 06:18:48,002 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 10% complete in 00:00:26 (386.4K triples/sec)
INFO  2019-10-02 06:18:49,730 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 11% complete in 00:00:27 (398.6K triples/sec)
INFO  2019-10-02 06:18:51,286 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 12% complete in 00:00:29 (411.9K triples/sec)
INFO  2019-10-02 06:18:52,682 [Stardog.Executor-120] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 13% complete in 00:00:30 (426.0K triples/sec)
INFO  2019-10-02 06:18:54,064 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 14% complete in 00:00:32 (439.0K triples/sec)
INFO  2019-10-02 06:18:55,371 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 15% complete in 00:00:33 (452.0K triples/sec)
INFO  2019-10-02 06:18:56,741 [Stardog.Executor-113] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 16% complete in 00:00:34 (463.2K triples/sec)
INFO  2019-10-02 06:18:58,093 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 17% complete in 00:00:36 (473.8K triples/sec)
INFO  2019-10-02 06:18:59,416 [Stardog.Executor-107] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 18% complete in 00:00:37 (484.0K triples/sec)
INFO  2019-10-02 06:19:00,695 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 19% complete in 00:00:38 (494.0K triples/sec)
INFO  2019-10-02 06:19:01,913 [Stardog.Executor-120] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 20% complete in 00:00:40 (504.2K triples/sec)
INFO  2019-10-02 06:19:11,130 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 21% complete in 00:00:49 (430.3K triples/sec)
INFO  2019-10-02 06:19:12,278 [Stardog.Executor-116] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 22% complete in 00:00:50 (440.5K triples/sec)
INFO  2019-10-02 06:19:13,476 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 23% complete in 00:00:51 (449.8K triples/sec)
INFO  2019-10-02 06:19:14,947 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 24% complete in 00:00:53 (456.4K triples/sec)
INFO  2019-10-02 06:19:17,117 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 24% complete in 00:00:55 (438.5K triples/sec)
INFO  2019-10-02 06:19:19,736 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 26% complete in 00:00:57 (453.5K triples/sec)
INFO  2019-10-02 06:19:21,723 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 26% complete in 00:00:59 (438.4K triples/sec)
INFO  2019-10-02 06:19:23,648 [Stardog.Executor-120] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 27% complete in 00:01:01 (441.1K triples/sec)
INFO  2019-10-02 06:19:25,537 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 29% complete in 00:01:03 (459.7K triples/sec)
INFO  2019-10-02 06:19:28,897 [Stardog.Executor-113] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 30% complete in 00:01:07 (451.7K triples/sec)
INFO  2019-10-02 06:19:30,918 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 30% complete in 00:01:09 (438.5K triples/sec)
INFO  2019-10-02 06:19:32,952 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 32% complete in 00:01:11 (454.3K triples/sec)
INFO  2019-10-02 06:19:34,953 [Stardog.Executor-120] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 33% complete in 00:01:13 (455.7K triples/sec)
INFO  2019-10-02 06:19:38,499 [Stardog.Executor-120] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 34% complete in 00:01:16 (447.8K triples/sec)
INFO  2019-10-02 06:19:40,645 [Stardog.Executor-113] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 35% complete in 00:01:18 (448.4K triples/sec)
INFO  2019-10-02 06:19:42,760 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 35% complete in 00:01:20 (436.7K triples/sec)
INFO  2019-10-02 06:19:58,129 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 36% complete in 00:01:36 (377.4K triples/sec)
INFO  2019-10-02 06:20:00,077 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 37% complete in 00:01:38 (380.2K triples/sec)
INFO  2019-10-02 06:20:02,928 [Stardog.Executor-118] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 39% complete in 00:01:41 (389.4K triples/sec)
INFO  2019-10-02 06:20:04,557 [Stardog.Executor-113] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 40% complete in 00:01:42 (393.1K triples/sec)
INFO  2019-10-02 06:20:06,233 [Stardog.Executor-107] com.complexible.stardog.index.Index:printInternal(314): Computing statistics: 41% complete in 00:01:44 (396.4K triples/sec)

I ran stardog with export STARDOG_SERVER_JAVA_ARGS="-Xmx64g -Xms8g -XX:MaxDirectMemorySize=64g"
OS: centos 7
stardog.properties file in $STARDOG_HOME

index.statistics.update.automatic=false
index.named.graphs=false
memory.mode=bulk_load

Can I disable this feature? Thanks

If you indeed do bulk loading, that is, the db create command, then statistics cannot be disabled. The index.statistics.update.automatic disables statistics for further updates of the database, e.g. via the data add command.

That said, statistics usually completes faster than other phases of bulk loading such as parsing and indexing. If it's slower for you, we'd love to look at your data (possibly in the obfuscated form).

Best,
Pavel

Hi @pavel,

Do you want a small sample of data or entire file?

Ideally the entire dataset.

After downloading and unzip the dataset, go to SupportingInformation directory and run

$ mkdir nary_sp
$ perl ./codes_for_cardinalassertion/sp_rdfization.pl

It will generate the output file (nary_sp/sp.ttl) (in my laptop is under 1min) and this is the file I want to bulk load

Thanks

One question @pavel,

Does the computing statistics phase run everytime I start the Stardog server or it just run once when bulk loading?
Thanks

Thanks, we'll try to reproduce the issue. Re: your question, statistics is built when you load a fresh database with db create and optimize it later after updates with db optimize (though it can be disabled for the latter with -o optimize.statistics=false). If you simply start the server, it should not rebuild statistics.

I must add, however, that selectivity statistics is the main source of information for the cost-based query optimiser so disabling it like that may have negative effects on query performance.

Cheers,
Pavel

1 Like

Thanks again, I confirm the problem. We should be able to address it before the next release but, unfortunately, there's no easy workaround for you. You can get around it by first creating an empty database and then adding the data with the data add command but with empty statistics it's likely that queries will run slowly (in many cases though it's possible to nudge the optimiser towards good plans with #pragma hints). Remove memory.mode=bulk_load if you do this.

Looks like the dataset is pretty unusual as it contains some very high number of properties and binary chains, like :a : p : b . :b :q :c for different properties :p and :q.

For your reference the internal ticket number is #7959, you may look it up in the future release notes.

Thanks for the data,
Pavel

Thanks @pavel,

I can import and query now, this is a good workaround.

hi @pavel,

Can I prevent Stardog from recomputing index when starting server.
I create an empty db and use data add to add multiple small files
Here is the error

ERROR 2019-10-12 09:34:53,600 [main] com.complexible.stardog.index.disk.statistics.DiskCharacteristicSetsManager:load(68): Error loading statistics for pc_sp_noncar, re-computing it
ERROR 2019-10-12 09:34:53,601 [main] com.complexible.stardog.index.disk.statistics.DiskCharacteristicSetsManager:load(69): 
com.complexible.common.io.impl.PageNotFoundException: Page not found: 2
	at com.complexible.common.io.impl.AbstractPagedFile.getPage(AbstractPagedFile.java:145) ~[stardog-utils-common-5.3.6.jar:?]
	at com.complexible.common.io.impl.PageManagerImpl.getPageInternal(PageManagerImpl.java:172) ~[stardog-utils-common-5.3.6.jar:?]
	at com.complexible.common.io.impl.PageManagerImpl.getPage(PageManagerImpl.java:155) ~[stardog-utils-common-5.3.6.jar:?]
	at com.complexible.common.io.impl.MultiPage.traversePageLinks(MultiPage.java:118) ~[stardog-utils-common-5.3.6.jar:?]
	at com.complexible.common.io.impl.MultiPage.read(MultiPage.java:81) ~[stardog-utils-common-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.btree.impl.PagedKeysIO.read(PagedKeysIO.java:47) ~[stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.btree.impl.ArrayKeys.<init>(ArrayKeys.java:39) ~[stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.btree.impl.ArrayKeys$Factory.create(ArrayKeys.java:179) ~[stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.btree.BPlusTreeFactory.build(BPlusTreeFactory.java:126) ~[stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.statistics.DiskCharacteristicSetsStatisticsLoader.doLoad(DiskCharacteristicSetsStatisticsLoader.java:120) ~[stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.statistics.DiskCharacteristicSetsStatisticsLoader.load(DiskCharacteristicSetsStatisticsLoader.java:58) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.DiskIndex.<init>(DiskIndex.java:43) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.disk.DefaultDiskIndexReader.read(DefaultDiskIndexReader.java:129) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.index.io.IndexIO.read(IndexIO.java:324) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.db.DatabaseFactoryImpl.read(DatabaseFactoryImpl.java:126) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.db.DatabaseFactoryImpl.read(DatabaseFactoryImpl.java:55) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.StardogKernel.openDatabase(StardogKernel.java:2708) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.StardogKernel.initDatabases(StardogKernel.java:2432) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.StardogKernel.start(StardogKernel.java:2335) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.StardogKernel.initialize(StardogKernel.java:799) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.Stardog.initKernel(Stardog.java:214) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.Stardog.<init>(Stardog.java:206) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.Stardog.<init>(Stardog.java:60) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.Stardog$StardogBuilder.create(Stardog.java:597) [stardog-5.3.6.jar:?]
	at com.complexible.stardog.cli.impl.ServerStart.call(ServerStart.java:162) [stardog-cli-5.3.6.jar:?]
	at com.complexible.stardog.cli.impl.ServerStart.call(ServerStart.java:41) [stardog-cli-5.3.6.jar:?]
	at com.complexible.stardog.cli.CLIBase.execute(CLIBase.java:55) [stardog-cli-5.3.6.jar:?]
	at com.complexible.stardog.cli.admin.CLI.main(CLI.java:186) [stardog-cli-5.3.6.jar:?]
Computing statistics: 6% complete in 00:00:08 (117.9K triples/sec)
Computing statistics: 8% complete in 00:00:10 (130.8K triples/sec)

Thanks

Could you try to create an empty db, add some small amount of data and only after that disable index.statistics.update.automatic? Then add the rest of the data.

As long as there's at least some version of statistics available for each database, the server shouldn't try to recompute it.

Thanks Pavel, I'll try it

Hello again,

We were looking at this issue this week and happy to say that as of next release Stardog should not get stuck when computing statistics. However I wonder if you could tell a few words what kind of dataset this is, particularly why it contains >3M distinct predicates in a ~20M triples data. This is so unusual that I thought there might be an issue in the data generation script (if there's such), but maybe not.

Thanks,
Pavel

Hi @pavel
This is a rdf model called singleton property
https://dl.acm.org/citation.cfm?id=2567973

I think maybe the meta nodes cause an unusual number of predicates, hope this help!

Thanks,
Duy

OK, thanks, I see now. It's the unique predicate names generated for each statement with properties that blow up the number of predicates like that. Honestly I think it's not a great way of approaching the issue with statement properties in RDF (using intermediate nodes is more common and usually works OK in practice), but at least we've been able to address the issues with statistics that it causes.

Thanks,
Pavel