I'm trying to integrate Stardog with PySpark using the Stardog Spark Connector. However, when I run my script, I get the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.complexible.stardog.spark.datasource.
Caused by: java.lang.ClassNotFoundException: com.complexible.stardog.spark.datasource.DefaultSource
My environment:
- OS: Windows
- Spark version: 3.5.4
- Python version: 3.9.7
- Java version: jdk-11
- Stardog Spark Connector JAR: stardog-spark-connector-3.1.0.jar
- Running inside a virtual environment (
.venv
) in VS Code
Code Snippet:
from pyspark.sql import SparkSession
Initialize SparkSession with Stardog Connector
spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()
print("Spark session created successfully!")
Load algorithm configuration from the properties file
config_file = "PageRankConfigFilePath"
Read from Stardog using Spark
df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()
Show results (if applicable)
df.show()
Stop Spark
spark.stop()
What I Have Tried So Far:
- Verified
spark-submit
works (spark-submit --version
)
- Checked if the JAR exists at
path/to/stardog-spark-connector-3.1.0.jar
- Attempted running with
--jars
option instead of spark.jars
in code:
spark-submit --jars path/to/stardog-spark-connector-3.1.0.jar my_spark_job.py
Questions:
- Is
com.complexible.stardog.spark.datasource
the correct format for Stardog Spark Connector in version 3.1.0?
- Do I need a different JAR file or an additional dependency for Stardog integration with PySpark?
- Any suggestions on debugging or alternative ways to load the Stardog Spark Connector?
Would really appreciate any help! Thanks in advance.
Hi @Jauwad_Mazhari ,
You need to use com.stardog.spark.datasource.StardogSource
classs similar to this example.
Also, please note that there is no such package with com.complexible
.
~Tapan
داداش اگه spark.jars جواب داده، پس مشکل این بوده که کانکتور رو درست به اسپارک معرفی نکرده بوده. حالا اگه کار میکنه، بهتره کد رو به این شکل نهایی کنی که همیشه درست اجرا بشه: ببین به یه بار اجرا دلخوش نکن ، حتمیش کن
from pyspark.sql import SparkSession
spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()
print("Spark session created successfully!")
config_file = "PageRankConfigFilePath"
df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()
df.show()
spark.stop()
حالا اگه هنوز شک داره، میتونه با این دستور چک کن که اسپارک JAR رو شناسایی کرده یا نه: امکان داره شناسایی اشتباه شده یا انجام نداده
print(spark.sparkContext.getConf().get("spark.jars"))
اگه مسیر JAR رو نشون داد، ببین یعنی اوکیه و دیگه نباید مشکلی باشه.
در تاریخ چهارشنبه ۵ مارس ۲۰۲۵، ۲:۵۳ بعدازظهر ایمان خورشیدی IMAN KHORSHIDI <iman.khorshidi.alikordi1985@gmail.com> نوشت:
Hi @Tapan_Sharma
Thankyou for your response. there is a package with com.complexible please check with stardog official docs.
After using this class com.stardog.spark.datasource.StardogSource
now I am not getting ClassNotFoundException but I am getting error as:
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
I have run the custom SPARQL query with the same credentials it is working fine but in spark environment it is not working.
please give some input on it how to resolve this exception.
**Note: i am using Python here not java. **
@Jauwad_Mazhari , Apologies for the confusion with respect to package name. Wanted to convey that there is no spark
subpackage within com.complexible
.
Authentication issue shouldn't have come.
Can you please share the logs of spark application and Stardog for investigation?
~Tapan
Hi @Tapan_Sharma
Thanks for your clarity on subpackage I can understand that now.
I am attaching the spark_log.txt file for your reference please check and let me know how to resolve that issue.
Regards
Jauwad
spark_log.txt (21.0 KB)
@Jauwad_Mazhari , Can you please download the spark connector release 3.2.0 from here? This jar is compatible with Spark 3.5.0. You should be able to connect to Stardog.
Please let me know if you face any issue.
Regards
Tapan
Hey @Tapan_Sharma I am using stardog cloud free endpoint does it cause the issue, I have checked with spark connector 3.2.0 as well, still I am getting the same exception
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
please let me know how to fix this exception.
@Jauwad_Mazhari , This should work in Stardog free account however I need to verify it. I will get back to you on this. Meanwhile can you share the Spark and Stardog logs where you are getting this exception as I could not find the concerned AuthenticationException in the logs that you shared earlier?
Thanks!
Hey @Tapan_Sharma I have check that spark log file which I share earlier is not the correct that is already solved.
Now tapan I am using both the endpoint free cloud endpoint and paid cloud endpoint but there is no change in exception as I tried many things but still not able to solve.
All the credentials which I am passing is correct, manually verified by running SPARQL query it is working fine but in spark it is giving the Unauthorized exception.
I am attaching you the correct spark log file but for stardog log file I am not able to find on stardog cloud and I can't run on localhost due to some reason so for stardog log file please tell me how to get from cloud if it is available.
spark_output.log (86.5 KB)
Hi @Jauwad_Mazhari , Stardog Cloud accounts support sso authentication, hence you will need to follow some extra steps.
You need to generate the auth token from stardog cloud host, as mentioned here:
You need to set stardog.token.authentication
option as true
and stardog.password
needs to be set as base64 encoded auth token that was generated in previous step.
Make these changes in your spark.read
statement and that should resolve your issue.
Let me know in case you face any further issues.
Thanks!
Hi @Tapan_Sharma Generated a Basic Auth Token by encoding :` using the following command:
echo -n "username:password" | base64
and tested it on curl via this command:
curl -v -H "Authorization: Basic <base64_token>" "https://sd-.stardog.cloud:5820/admin/databases" which is working properly.
I updated the spark.read statement as well:
df = spark.read
.format("com.stardog.spark.datasource.StardogSource")
.option("stardog.endpoint", "https://sd-.stardog.cloud:5820")
.option("stardog.database", "GraphAnalytics")
.option("stardog.token.authentication", "true")
.option("stardog.password", "<base64_token>")
.option("query", "SELECT * WHERE { ?s ?p ?o }")
.load()
Despite using the correct Basic Auth token, I am still receiving the same "Unauthorized" exception I am attaching the new log file as well for your reference. Basic Auth work via curl, but fails in the Spark job with the same token.
spark_log1.txt (43.3 KB)
Thanks
Jauwad
@Jauwad_Mazhari , did you create the token using the command given on this page?
Command should be something like this:
curl -u username:password https://express.stardog.cloud:5820/admin/token
Thanks!
@Tapan_Sharma I used the above command which you are referring it is giving me an access token after that to verify I called an API request that was also working now you told me to generate Base64 encoded token so for that I used this echo -n "username:password" | base64 command and the token which is given by this command I used in sdardog.password so does that correct?
Please use the below command against your Stardog server with your credentials to fetch the required token.
curl -u username:password https://express.stardog.cloud:5820/admin/token
After that, generate base64 encoded string using the command you are using.
echo -n <token_generated_in_previous_step> | base64**
Thanks!
After using echo -n <token_gen_in_prev_step> | base64
I am getting exception as:
java.lang.IllegalArgumentException: Illegal base64 character a
I am not sure why this would happen. Can you encode it using python or java code?
By using this code:
import base64
username = "your_username"
password = "your_password"
credentials = f"{username}:{password}"
token = base64.b64encode(credentials.encode()).decode()
print(f"Base64 Encoded Token: {token}")
This code is giving the token same as this command was giving:
echo -n "username:password" | base64 please check.
If I use this token it will give unauthorized exception.
by using this code:
import base64
Replace with your JWT or any string
jwt_token = "your_jwt_or_plain_text_token"
Encode the token
base64_token = base64.b64encode(jwt_token.encode("utf-8")).decode("utf-8")
print("Base64 Encoded JWT Token:", base64_token)
still same unauthorized exception
we are re-encoding may be bacause of that error is comming jwt token is already in base64 encoded format
does it work if we re-encode a token which is already in JWT/base64 format.
Hi @Jauwad_Mazhari ,
please confirm you generated the token with the help of below command using your own credentials and stardog server url:
curl -u username:password https://express.stardog.cloud:5820/admin/token
Step 2: Encode the above token.
As I can see in your examples, you are using both encode and decode in the same statement.
Can you please re-visit and confirm the steps?
Thanks
Tapan
Hi @Tapan_Sharma yes I have generated the token from this command only:
curl -u username:password https://express.stardog.cloud:5820/admin/token
After that I am using this command to encode into base64:
echo -n "<your_token>" | base64
after this I am getting this exception:
Caused by: java.lang.IllegalArgumentException: Illegal base64 character a
and using python code:
import base64
jwt_token = "jwt_token"
base64_token = base64.b64encode(jwt_token.encode("utf-8")).decode("utf-8")
token.encode("utf-8")
— Converts token to bytes.
base64.b64encode()
— Encodes the byte token to base64.
.decode("utf-8")
— Converts the base64 output back to a string.
print("Base64 Encoded JWT Token:", base64_token)
If I use token after gerating by this code it is giving unauthorized exception and if I remove decode then I am getting IllegalArgumentException.