Similarity Search Model Creation issue

Dear,

I am working with the Similarity Search function for a while. I created a java lib which is able to generate a similarity search model (query) and a select query for the model. However, I encountered a vague issue. If I run the insert query from Stardog studio the query is accepted and when I run the select query it runs perfectly.
However, when I run the exact same query from java using jena bindings. The insert query is accepted and a model is created, however after the select query ran, I got the following error:

QueryEval: com.complexible.common.rdf.model.StardogBNode cannot be cast to org.openrdf.model.Literal

If I take a look at the models, I see a some triples in the model generated from stardog studio. In the model created through java I see 1694881 blanknodes, which I cannot explain, but I guess this is not correct comparing this model with the model created by stardog studio.

What could be the issue. The queries are exactly the same, but it seems Stardog does something different internally?

The model generated via stardog studio: https://cloudbox.netage.nl/f/38195a0ec3/?dl=1
The model generated via java-jena: https://cloudbox.netage.nl/f/4eea093160/?dl=1

Hi,

Thanks for the report. Can you include the error message from the server's log file?

Jess

Sure, here you could find the log -> https://cloudbox.netage.nl/f/03141de427/?dl=1

Thanks for sharing the log file. It helps to see what's going on.

I see in your other post that you originally had this code (using createRemote() with an endpoint):

UpdateProcessor updateExec = UpdateExecutionFactory.createRemote(request, endpoint);

but changed it to (using create() with a Dataset):

SDJenaFactory.createDataset(this.aConn);
UpdateProcessor updateExec = UpdateExecutionFactory.create(UpdateFactory.create(arg0), getDataset());

The latter does not work as it is processed by the Jena engine by executing the query separately from the insert. This is also less efficient. I tested the former code and it's sending the entire query to Stardog, which is what is required for the model training to work properly. Please give it a try.

Best,
Jess

Which makes me wonder is there any benefit of using the Jena Models with the Stardog Dataset over using plain sparql ?
( besides this edge case )

The Jena support is a compatibility layer. It obscures the way Stardog works as you've seen. Using the native API or SPARQL via HTTP will give you the best results.

in other words, do everything via the API and wrap it in Jena objects ourselves for our applications ?

Thnx for your response.
I indeed changed the code. But I tried it by changing it back:

AggregateRegistry.register("tag:stardog:api:analytics:set", (agg, distinct) -> AggNull.createAccNull(), NodeConst.nodeNil);
UpdateProcessor updateExec = UpdateExecutionFactory.createRemote(UpdateFactory.create(similarityModel.createInsertModelQuery(m, "http://data.resc.info/kro", modelName.replaceAll("[-+.^:,]","").replace(" ", "_"))), "http://stardog.netage.nl:5820/annex/kro/sparql/query");
updateExec.execute();

The query is created with the "createInsertModelQuery" method:

prefix spa: <tag:stardog:api:analytics:>
prefix : <http://schema.org/>
INSERT { graph spa:model { :basic_model a spa:SimilarityModel; spa:arguments (?bouwbest2 ?functie2 ?status2 ?bag_oppvlk2 ?bouwjaar2 ?maximale_hoogte2 ?gemiddelde_hoogte2 ?pandHoogte2 ?bouwlagen2 );
spa:predict ?object .}}WHERE {
SELECT
(spa:set(?bouwbest) as ?bouwbest2) (spa:set(?functie) as ?functie2) (spa:set(?status) as ?status2) (spa:set(?bag_oppvlk) as ?bag_oppvlk2) (spa:set(?bouwjaar) as ?bouwjaar2) (spa:set(?maximale_hoogte) as ?maximale_hoogte2) (spa:set(?gemiddelde_hoogte) as ?gemiddelde_hoogte2) (spa:set(?pandHoogte) as ?pandHoogte2) (spa:set(?bouwlagen) as ?bouwlagen2) ?object { GRAPH <http://data.resc.info/kro> {?object <http://vocab.netage.nl/kro#hasWOZ> ?hasWOZ.
?hasWOZ <http://vocab.netage.nl/kro#bouwbest> ?bouwbest.
?object <http://vocab.netage.nl/kro#hasBuilding> ?hasBuilding.
?hasBuilding <http://vocab.netage.nl/kro#functie> ?functie.
?hasBuilding <http://vocab.netage.nl/kro#status> ?status.
?hasBuilding <http://vocab.netage.nl/kro#bag_oppvlk> ?bag_oppvlk.
?hasBuilding <http://vocab.netage.nl/kro#bouwjaar> ?bouwjaar.
?object <http://vocab.netage.nl/kro#hasAHN> ?hasAHN.
?hasAHN <http://vocab.netage.nl/kro#maximale_hoogte> ?maximale_hoogte.
?hasAHN <http://vocab.netage.nl/kro#gemiddelde_hoogte> ?gemiddelde_hoogte.
?hasAHN <http://vocab.netage.nl/kro#pandHoogte> ?pandHoogte.
?hasAHN <http://vocab.netage.nl/kro#bouwlagen> ?bouwlagen.
}} GROUP BY ?object}

Result is the same, still a model with lots of blanknodes (1799800 triples)

Small update, I fire the query to stardog with a simple http get, which works. So the issue is indeed screwed somewhere in JENA. I also played around with the stardog API. Which also works. We did some small changes in the code and now it works.

private Connection aConn = ConnectionConfiguration...

aConn.begin();
UpdateQuery q = aConn.update(arg0);
q.execute();
aConn.commit();

Thanks for your help!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.