How to force a refresh of the FTS index

I'm using a FTS SPARQL query as the backend to a search form. Users pointed out that some items were missing from the search. I have been unable to refresh the index, I've tried adjusting db metadata and turning the DB offline and online again, but the logs only show that the index was checked but did not reindex (I assume it believes it has no reason to!)
Is there a way to force a complete refresh of the FTS index? As minimum downtime to the DB as possible please!

The production server has stardog version 6.2.3

Hi Jen,

6.2.3 is kind of old now but I think the first thing to try would be stardog-admin db optimize {database}. Check stardog.log to see if there're messages about rebuilding the search index.

Is there a particular reason you're unable to upgrade to Stardog 7? We can discuss it off the list.

Best,
Pavel

Ah yes, I'd also tried db optimize but no luck there either.
What sort of messages should I be looking for in the logs? I don't want to copy/paste here as it's a production server.
Nothing found by grepping 'rebuild'... which namespace would be logging the messages?

Should be something like "Re-indexing text" or "Updating text index" with percentage values. If your database is small and the indexing takes <6s, then there won't be anything. You can just visually inspect everything at the bottom since the time you did optimize.

We can also look at the database metadata stardog-admin metadata get {database} to see if there's anything strange there.

Finally, how exactly do you know that the index is out of sync? Do FTS searches fail to find literals which you get with simple select * { ?s ?p " literal " } kind of queries?

Exactly, resources that had been returned in FTS on a literal search have now disappeared. They disappeared after an editing interface updated some of the objects attached to the same resource (but the object that we are searching on is untouched).
We have also seen this behaviour "fix" itself, but some stubbornly refuse to be returned in FTS queries even though a standard query shows the data is as expected.

It is a small DB, and I see nothing about reindexing in the logs. Can any settings be adjusted perhaps to account for the small DB size and the types of updates that are happening to resources to maybe trigger the reindex more frequently? A force refresh would also do the job.

here's the bulk of the metadata

 index.aggregate                           | On                                                                               |
| index.differential.enable.limit           | 500000                                                                           |
| index.differential.merge.limit            | 20000                                                                            |
| index.differential.size                   | 0                                                                                |
| index.disk.page.count.total               | 8739                                                                             |
| index.disk.page.count.used                | 5442                                                                             |
| index.disk.page.fill.ratio                | 0.8120716772111012                                                               |
| index.last.tx                             | d2d2946c-0574-4553-a645-f313fb8381b5                                             |
| index.literals.canonical                  | true                                                                             |
| index.lucene.mmap                         | true                                                                             |
| index.named.graphs                        | true                                                                             |
| index.persist                             | true                                                                             |
| index.persist.sync                        | true                                                                             |
| index.size                                | 677716                                                                           |
| index.statistics.cache.capacity           | 1024                                                                             |
| index.statistics.characteristic.limit     | 10000                                                                            |
| index.statistics.update.automatic         | true                                                                             |
| index.statistics.update.blocking.ratio    | 0.0                                                                              |
| index.statistics.update.min.size          | 10000                                                                            |
| index.statistics.update.ratio             | 0.1                                                                              |
| index.type                                | Disk                                                                             |
| index.writer.merge.limit                  | 1000                                                                             |
| literal.comparison.extended               | true                                                                             |
| literal.language.normalization            | DEFAULT                                                                          |
| preserve.bnode.ids                        | true                                                                             |
| progress.monitor.enabled                  | true                                                                             |
| query.all.graphs                          | true                                                                             |
| query.describe.strategy                   | default                                                                          |
| query.plan.reuse                          | ALWAYS                                                                           |
| query.pp.contexts                         | false                                                                            |
| query.timeout                             | 70s                                                                              |
| reasoning.approximate                     | false                                                                            |
| reasoning.classify.eager                  | true                                                                             |
| reasoning.consistency.automatic           | false                                                                            |
| reasoning.punning.enabled                 | false                                                                            |
| reasoning.sameas                          | OFF                                                                              |
| reasoning.schema.graphs                   | *                                                                                |
| reasoning.schema.timeout                  | 1m                                                                               |
| reasoning.type                            | NONE                                                                             |
| reasoning.virtual.graph.enabled           | true                                                                             |
| search.default.limit                      | -1                                                                               |
| search.enabled                            | true                                                                             |
| search.index.datatypes                    | http://www.w3.org/1999/02/22-rdf-syntax-ns#langString,                           |
|                                           | http://www.w3.org/2001/XMLSchema#string                                          |
| search.reindex.tx                         | true                                                                             |
| search.wildcard.search.enabled            | false                                                                            |
| security.named.graphs                     | false                                                                            |
| spatial.enabled                           | true                                                                             |
| spatial.index.version                     | 1                                                                                |
| spatial.precision                         | 11                                                                               |
| spatial.result.limit                      | 10000                                                                            |
| strict.parsing                            | false                                                                            |
| transaction.isolation                     | SNAPSHOT                                                                         |
| transaction.logging                       | false                                                                            |
| transaction.logging.ignore.startup.errors | true                                                                             |
| transaction.logging.rotation.remove       | true                                                                             |
| transaction.logging.rotation.size         | 524288000                                                                        |
| versioning.directory                      | versioning                                                                       |
| versioning.enabled                        | false                                                                            |

One sure way to force a refresh would be to disable FTS and then enable it back. I think both operations require offlining the database. The index will be recomputed when you bring the database online after re-enabling, but if the database is so small it should be quick.

I tried that yesterday too. Still missing certain results that we know should be in there. So no joy :frowning:

The only workaround I found was to data export all graphs, data add them into a new DB and then flip the connection to the new DB which seemed to have a sorted out FTS index (it returned expected results in the search). Luckily this was actually feasible due to the small size of the DB, as the import only took about an hour and a half.

Very strange! No idea why the initial DB would get its FTS index so stubbornly stuck

Well, first of all, it's better to do db create instead of data add. With <1M of data, it should be under a minute to bulk load the data.

But most importantly, if you offline the db, disable the search index, and bring it back online, there should be no trace of the old FTS left (you can verify that by looking in the data dir in your home, there's waldo dir for the FTS index). If you re-enable it, it'd be computed in the exact same way as for a new database.

And the correct way to turn off/on is through search.enabled = false/true

(just checking in case I'm disabling the wrong setting!)

I turned the db offline, set search.enabled to false, turned online and checked directory but waldo is still there.
Is it safe to manually delete that whole waldo directory ?

OK, we'll double check this. Yes, you can manually delete waldo, it should be re-computed on server startup. That dir does not contain any data that cannot be recomputed from RDF.

Ok that got it.
I offlined to re-enable search and deleted the waldo directory at the same time before bringing the DB back online. The resources I'm expecting to see in search results are now there

OK, glad to hear. Quite frankly I have no idea what was going on but in v7 stardog-admin db optimize -o optimize.search=true will always rebuild the FTS index regardless of the state of the index (i.e. even if the system believes it's in the clean state).

Best,
Pavel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.