I built a graph using virtualized data from Databricks as the resources. There are 8 resources, one for each data table. The sizes of the tables are (88MB, 20MB, 20MB, 16MB, 108KB, 108MB, 135MB, 145KB).
I have a few questions about reducing execution time for a graph with these features:
-
Is there a way to reduce the response time for querying the knowledge graph (from virtualized data of the size stated above) using the query builder? Often simple queries (e.g., pulling data from three tables, low-complexity query [I can provide the query plan in an email, it has some sensitive details about the dataset]) require >10s to complete.
-
Is there a way to reduce the time to populate node details? Expanding the features in the right-click "Expand-by" menu also often requires >10 seconds to populate.
(I imagine the answer to (1) is materializing the data, but thought I would confirm. Also, are there any alternatives to materialization?)
I've concocted the idea that the slow response times I experience may not be solely caused by using virtualized data. Perhaps the problems are due to the default model (<tag:stadog:api:context:local>
). I suspect that the default model does not reflect the updates I make - after generating the Virtual Graph and Model in Stardog Designer - the Studio.
For example, let's say that I want an inference rule to have a subclause in it (e.g., to aggregate data). I haven't figured out a way to do that in Designer. However, if I build in Designer and provide a general query to create the inference rule. Then I update that "general" query to the one I want and save, it doesn't seem like the default model updates.
Now that I state this, I suspect my logic to be off.
That said, in the next few days I intend to rebuild my data sources, virtual graphs, and named graphs (i.e., models) in a fresh database. I'll simply load the files I have for the graph to the database via the HTTP API. This will remove the Designer development from the loop and perhaps I'll get better response times from Explorer.