Request doesn't work with Stardog 7 (but works with Stardog 5)

Hello

I am preparing migration from stardog 5 to stardog 7 in my project. I've made a test with a big data (457k triples) for see if everything workd and i've seen that a request does not work in stardog 7 but work in stardog 5 with the same datas. In stardog 7 i have no error just a timeout.

Can you help me to understand ?
Datas : https://drive.google.com/file/d/1-ugVpO5aM8PZPydbcu_DGTNzktwPAkxG/view?usp=sharing

Namespaces :
dcm "Page Redirection to current version of DICOM Controlled Terminology Definitions"
skos "SKOS Simple Knowledge Organization System Namespace Document - HTML Variant, 18 August 2009 Recommendation Edition"
ontomedirad "MediCIS - Inserm - UR1 - UMR 1099 LTSI
purl "http://purl.obolibrary.org/obo/"

Request :
SELECT DISTINCT ?ClinResearchStudyId ?PatientName ?LabelSex ?AgeInYears ?ValueWeight ?LabelUnitWeight ?ValueHeight ?LabelUnitHeight ?StudyDescr ?LabelOrgan ?ProtocolName ?ProtocolDescr ?ExamDate ?ExamTime ?CTAcqClass ?NameRespInstitution ?ValueKVP ?LabelUnitKVP ?ValueTubeCur ?LabelUnitTubeCur ?LabelXRayModulationType ?ValueFocalSpot ?LabelUnitFocalSpot ?ValueNominalTotalCollimWidth ?LabelUnitNominalTotalCollimWidth ?XRayFilterClassLabel ?ValueExposureTime ?LabelUnitExposureTime ?ValueExposureInmAsec ?LabelUnitExposureInmAsec ?Dataset ?DatasetClassLabel ?ImageTypeDescription ?DatasetHandle ?ImageFormat ?Model ?Manufacturer
WHERE {
?RolePat purl:BFO_0000054 ?Exam .
?Human purl:BFO_0000087 ?RolePat .
?Human ontomedirad:has_name ?PatientName .
?CTAcq purl:BFO_0000132 ?Exam .
?Exam ontomedirad:part_of_study ?ClinResearchStudy .
?ClinResearchStudy ontomedirad:has_id ?ClinResearchStudyId .
?CTAcq rdf:type ?CTAcqClass .
?CTAcqClass rdfs:subClassOf* ontomedirad:CT_acquisition .
?Dataset ontomedirad:is_specified_output_of ?CTAcq .
?Dataset ontomedirad:has_DICOM_image_type_description ?ImageTypeDescription .
?Dataset rdf:type ?Datasetclass .
?Datasetclass skos:prefLabel ?DatasetClassLabel .
?CTAcq ontomedirad:has_protocol ?Protocol .
OPTIONAL { ?Protocol ontomedirad:has_name ?ProtocolName .
?Protocol ontomedirad:has_description ?ProtocolDescr . }
OPTIONAL { ?Exam ontomedirad:has_beginning_date ?ExamDate .
?Exam ontomedirad:has_beginning_time ?ExamTime . }
OPTIONAL { ?Exam ontomedirad:has_description ?StudyDescr }
OPTIONAL { ?Exam ontomedirad:has_target_region ?targetregion .
?targetregion rdf:type ?OrganClass .
?OrganClass rdfs:label ?LabelOrgan . }
OPTIONAL { ?Human ontomedirad:has_sex ?PatientSex .
?PatientSex rdfs:label ?LabelSex . }
OPTIONAL { ?PatientAge rdf:type ontomedirad:age_of_patient_undergoing_medical_procedure .
?PatientAge ontomedirad:is_about_procedure ?Exam .
?PatientAge ontomedirad:is_about ?Human .
?PatientAge ontomedirad:years ?AgeInYears . }
OPTIONAL { ?XRayModulationType purl:BFO_0000177 ?Protocol .
?XRayModulationType rdfs:subClassOf dcm:113842 .
?XRayModulationType skos:prefLabel ?LabelXRayModulationType . }
OPTIONAL { ?CTAcq ontomedirad:has_setting ?KVP .
?KVP rdf:type dcm:113733 .
?KVP purl:IAO_0000004 ?ValueKVP .
?KVP purl:IAO_0000039 ?UnitKVP .
?UnitKVP rdfs:label ?LabelUnitKVP.}
OPTIONAL { ?CTAcq ontomedirad:has_setting ?TubeCur .
?TubeCur rdf:type dcm:113734 .
?TubeCur purl:IAO_0000004 ?ValueTubeCur .
?TubeCur purl:IAO_0000039 ?UnitTubeCur.
?UnitTubeCur rdfs:label ?LabelUnitTubeCur .}
OPTIONAL { ?CTAcq ontomedirad:has_setting ?FocalSpot .
?FocalSpot rdf:type ontomedirad:focal_spot .
?FocalSpot purl:IAO_0000004 ?ValueFocalSpot .
?FocalSpot purl:IAO_0000039 ?UnitFocalSpot .
?UnitFocalSpot rdfs:label ?LabelUnitFocalSpot .}
OPTIONAL { ?CTAcq ontomedirad:has_setting ?NominalTotalCollimWidth .
?NominalTotalCollimWidth rdf:type dcm:113827 .
?NominalTotalCollimWidth purl:IAO_0000004 ?ValueNominalTotalCollimWidth .
?NominalTotalCollimWidth purl:IAO_0000039 ?UnitNominalTotalCollimWidth .
?UnitNominalTotalCollimWidth rdfs:label ?LabelUnitNominalTotalCollimWidth .}
OPTIONAL { ?CTAcq ontomedirad:has_setting ?ExposureTime .
?ExposureTime rdf:type dcm:113824 .
?ExposureTime purl:IAO_0000004 ?ValueExposureTime .
?ExposureTime purl:IAO_0000039 ?UnitExposureTime .
?UnitExposureTime rdfs:label ?LabelUnitExposureTime .}
OPTIONAL { ?CTAcq ontomedirad:has_setting ?ExposureInmAsec .
?ExposureInmAsec rdf:type ontomedirad:exposure .
?ExposureInmAsec purl:IAO_0000004 ?ValueExposureInmAsec .
?ExposureInmAsec purl:IAO_0000039 ?UnitExposureInmAsec .
?UnitExposureInmAsec rdfs:label ?LabelUnitExposureInmAsec .}
OPTIONAL { ?Dataset ontomedirad:has_IRDBB_WADO_handle ?DatasetHandle .}
OPTIONAL { ?Dataset ontomedirad:has_format ?ImageFormat .}
OPTIONAL { ?Scanner rdf:type ontomedirad:CT_scanner .
?AcqRole rdf:type ontomedirad:image_acquisition_role .
?Scanner purl:BFO_0000087 ?AcqRole.
?AcqRole purl:BFO_0000054 ?CTAcq .
?Scanner ontomedirad:has_manufacturer_name ?Manufacturer .
?Scanner ontomedirad:has_model_name ?Model .}
OPTIONAL { ?XRayFilter purl:BFO_0000177 ?Scanner .
?XRayFilter rdf:type ?XRayFilterClass .
?XRayFilterClass rdfs:subClassOf* dcm:113771 .
?XRayFilterClass skos:prefLabel ?XRayFilterClassLabel . }
OPTIONAL { ?RespInstitution rdf:type ontomedirad:institution .
?RespInstitution ontomedirad:has_name ?NameRespInstitution .
?RespInstitutionrole rdf:type ontomedirad:role_of_responsible_organization .
?RespInstitutionrole purl:BFO_0000054 ?CTAcq .
?RespInstitutionrole purl:BFO_0000052 ?RespInstitution .}
OPTIONAL { ?PatientWeight rdf:type ontomedirad:patient_weight .
?PatientWeight ontomedirad:is_about_procedure ?Exam .
?PatientWeight ontomedirad:is_about ?Human .
?PatientWeight purl:IAO_0000004 ?ValueWeight .
?PatientWeight purl:IAO_0000039 ?UnitWeight .
?UnitWeight rdfs:label ?LabelUnitWeight . }
OPTIONAL { ?PatientHeight rdf:type ontomedirad:patient_height .
?PatientHeight ontomedirad:is_about_procedure ?Exam .
?PatientHeight ontomedirad:is_about ?Human .
?PatientHeight purl:IAO_0000004 ?ValueHeight .
?PatientHeight purl:IAO_0000039 ?UnitHeight .
?UnitHeight rdfs:label ?LabelUnitHeight . }
} ORDER BY ?ClinResearchStudyId ?PatientName

We can take a look but it'd help a lot if you share the data (you can do it privately via email/Dropbox). For a query as complex as this it's not very uncommon to change behaviour across major releases, and it may require some hints.

Best,
Pavel

My apologies, I missed the link to the data. We will take a look a get back to you.

Best,
Pavel

Hi,

Using your data, I am unable to reproduce these query issues. Are you using reasoning when you run the queries? How many results are you expecting the query to produce?

In stardog 5 the request give me more than 50 results results in 188948 ms
In stardog 7 i have no result
Both case without reasoning

Thank you for helping me

Hi,

We are able to reproduce this issue with your data and we'll look into this issue.

1 Like

Hello,

We looked into the problem and the reason is the following two blocks in the query:

OPTIONAL {
?Scanner rdf:type ontomedirad:CT_scanner .
?AcqRole rdf:type ontomedirad:image_acquisition_role .
?Scanner purl:BFO_0000087 ?AcqRole.
?AcqRole purl:BFO_0000054 ?CTAcq .
?Scanner ontomedirad:has_manufacturer_name ?Manufacturer .
?Scanner ontomedirad:has_model_name ?Model .
}

and

OPTIONAL {
?XRayFilter purl:BFO_0000177 ?Scanner .
?XRayFilter rdf:type ?XRayFilterClass .
?XRayFilterClass rdfs:subClassOf* dcm:113771 .
?XRayFilterClass skos:prefLabel ?XRayFilterClassLabel .
}

Notice that the 2nd block only joins on the ?Scanner variable to the rest of the query and that variable only appears in the 1st block. Since both blocks are OPTIONALs, it's possible that the variable can take on NULL values (aka be "unbound"). When that happens, Stardog has to use the slowest join algorithm, the nested loop join, which slows things down greatly [1]. It's an open ticket for us but it's also a general problem so we strongly recommend to do modeling / write queries s.t. to avoid it.

For example, if you make the first block non-OPTIONAL, the query would return the same number of results in <10s.

Cheers,
Pavel

[1] The reason for this is the semantics of joins in SPARQL which says that two partial results are always compatible if their shared variables is unbound in at least one of them. In your case it means that if ?Scanner is NULL after the first OPTIONAL, it will join with all results produced by the block in the 2nd OPTIONAL, which may not be what you want. Cf. SPARQL 1.1 Query Language

Thank you very much for your great help

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.