I am attempting to connect to one Stardog endpoint from two Databricks environments (each environment is administered by a different account). On the Databricks environment that I initally interacting with Stardog on, I can successfully connect to my endpoint with the following code:
import stardog
# Connection details for Stardog account
conn_details = {
'endpoint': 'https://le-tters&numbers.stardog.cloud:XXXX',
'username': '****',
'password': '****'
}
# Print the databases associated with account
with stardog.Admin(**conn_details) as admin:
for db in admin.databases():
print(db.name)
However on the other Databricks account, I am running into an error executing the same pystardog code:
ConnectionError: HTTPSConnectionPool(host='le-tters&numbers.stardog.cloud', port=XXXX): Max retries exceeded with url: /admin/alive (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
More broadly, this leads me to wonder how I can separate my Stardog account from my client's work I began developing once the project completes. Eventually the client will purchase a Stardog subscription, but we aren't at that point in the conversation yet. In the meantime, I intend to develop on my Stardog subscription and download the databases and hand them off (or provide them access through the "Invite User" functionality). Or would it be easier for me to create another Stardog account and develop client work from there?
I can connect to the Stardog endpoint from my personal computer with the same code as above. Though the error connecting to the next databricks environment remains.
The pystardog code is very straight forward, essentially providing an easy to use API around the python requests library to send HTTP(s) calls to Stardog. My understanding is that when you run a databricks notebook, that will actually get run by a worker in Databricks. Now if that worker is in a databricks account, where the primary assets are in your own VPC, there may be networking rules preventing outside network access. I.e. your databricks worker nodes can't see the stardog.cloud, or otherwise have it blocked in AWS/Azure networking.
Typically customers with this type of security would often take advantage of our AWS Private Link support in Stardog Cloud to tunnel between their databricks environment and their Stardog Cloud instance, enabling this, virtual graphs, and other features between the two products to flow freely.
This is something we'd be happy to look into more with you. If you're already working with Saket or Alex, they can help coordinate a meeting with me, or someone on my team.
Excellent, thank you for the information. I have a memory of whitelisting Stardog Cloud's AWS IP ( produswest2-vg-egress.stardog.cloud (44.238.183.206)) to allow communication between an Azure SQL server and Stardog Cloud. But have yet to create a connection between Azure Databricks and Stardog Cloud.
I'll reach out to Saket to coordinate. Thanks again.
I did a little more sleuthing on the source of the error and I wonder if the issue may reside on the server side. Below I show two curl executions from the Azure Databricks environments (from the one I am unable to connect to the Stardog Cloud server).
In the top command, I attempt to pull the header from stardog.cloud:5820/admin/databases - without success, as the connection says refused. In the bottom command, I pull the header from https://cloud.stardog.com successfully. I also tried www.google.com and was able to pull the header using the problematic Azure Databricks cluster from there too (command execution not shown).
Is it possible the stardog.cloud:5820 endpoint has some reason for refusing the connection from one Azure Databricks environment and not the other?
We're still looking at this, it would be good to confirm if header returned is coming from Stardog, and not a proxy. We have test azure databricks instance, and will look into if there's a way to block outbound ports or something of that nature. We have come across some instances where customer networking rules prevented connecting to 5820.
The issue was on the client side, the databricks cluster was unable to reach out to certain ports. The solution was creating a new cluster (within the same Vnet on Azure, but oddly without this issue).