I am using a timeout on my queries through pystardog. I see that the timeout works (Studio shows the query as TIMED OUT). But, I do not get a return to my code for 1-2 mins.
What is going on? Why would it take so long to return? Anything that I can do to improve this?
Are you querying over virtual graphs? Cancellation support (which underlies timeout handling) is improved in 7.5.0. You might be seeing the queries hitting the timer but being block in virtual graph queries for a significant amount of time following that.
Can you please share the output from stardog-admin server metrics --threads "jvm.threads" when you see the state of "Timed out" in the query manager / Studio?
10 threads with the following stack trace:
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-10" id=21 state=TIMED_WAITING
- waiting on <0x588d2db9> (a java.lang.Object)
- locked <0x588d2db9> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-9" id=20 state=TIMED_WAITING
- waiting on <0x0720c28f> (a java.lang.Object)
- locked <0x0720c28f> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-8" id=19 state=TIMED_WAITING
- waiting on <0x38275e0a> (a java.lang.Object)
- locked <0x38275e0a> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-7" id=18 state=TIMED_WAITING
- waiting on <0x56822b08> (a java.lang.Object)
- locked <0x56822b08> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-6" id=17 state=TIMED_WAITING
- waiting on <0x67d33484> (a java.lang.Object)
- locked <0x67d33484> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-5" id=16 state=TIMED_WAITING
- waiting on <0x6666b64b> (a java.lang.Object)
- locked <0x6666b64b> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-4" id=15 state=TIMED_WAITING
- waiting on <0x64b37383> (a java.lang.Object)
- locked <0x64b37383> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-3" id=14 state=TIMED_WAITING
- waiting on <0x23aede76> (a java.lang.Object)
- locked <0x23aede76> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-2" id=13 state=TIMED_WAITING
- waiting on <0x5d27b993> (a java.lang.Object)
- locked <0x5d27b993> (a java.lang.Object)
"0a0b6f34-5c62-432a-9677-bc5fa4006fde_Worker-1" id=12 state=TIMED_WAITING
- waiting on <0x0e209892> (a java.lang.Object)
- locked <0x0e209892> (a java.lang.Object)
at java.lang.Object.wait(Native Method)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
Based on what you sent me (offline), I can see that your query is spending time sorting results. Are you doing an ORDER BY on a large result set? This can also happen with intermediate results when sorting for joins. We currently aren't always able to cancel immediately while these types of procedures are running. Once the query is completely cancelled, it will be removed from the list in Studio and you should get an immediate response in the client.
Just a small follow-up: Stardog normally cancels long running internal operations like sorting intermediate results. Not cancelling an ORDER BY is an oversight which we will correct ASAP.