I'm trying to integrate Stardog with PySpark using the Stardog Spark Connector. However, when I run my script, I get the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.complexible.stardog.spark.datasource.
Caused by: java.lang.ClassNotFoundException: com.complexible.stardog.spark.datasource.DefaultSource
My environment:
- OS: Windows
- Spark version: 3.5.4
- Python version: 3.9.7
- Java version: jdk-11
- Stardog Spark Connector JAR: stardog-spark-connector-3.1.0.jar
- Running inside a virtual environment (
.venv
) in VS Code
Code Snippet:
from pyspark.sql import SparkSession
Initialize SparkSession with Stardog Connector
spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()
print("Spark session created successfully!")
Load algorithm configuration from the properties file
config_file = "PageRankConfigFilePath"
Read from Stardog using Spark
df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()
Show results (if applicable)
df.show()
Stop Spark
spark.stop()
What I Have Tried So Far:
- Verified
spark-submit
works (spark-submit --version
)
- Checked if the JAR exists at
path/to/stardog-spark-connector-3.1.0.jar
- Attempted running with
--jars
option instead of spark.jars
in code:
spark-submit --jars path/to/stardog-spark-connector-3.1.0.jar my_spark_job.py
Questions:
- Is
com.complexible.stardog.spark.datasource
the correct format for Stardog Spark Connector in version 3.1.0?
- Do I need a different JAR file or an additional dependency for Stardog integration with PySpark?
- Any suggestions on debugging or alternative ways to load the Stardog Spark Connector?
Would really appreciate any help! Thanks in advance.
Hi @Jauwad_Mazhari ,
You need to use com.stardog.spark.datasource.StardogSource
classs similar to this example.
Also, please note that there is no such package with com.complexible
.
~Tapan
داداش اگه spark.jars جواب داده، پس مشکل این بوده که کانکتور رو درست به اسپارک معرفی نکرده بوده. حالا اگه کار میکنه، بهتره کد رو به این شکل نهایی کنی که همیشه درست اجرا بشه: ببین به یه بار اجرا دلخوش نکن ، حتمیش کن
from pyspark.sql import SparkSession
spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()
print("Spark session created successfully!")
config_file = "PageRankConfigFilePath"
df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()
df.show()
spark.stop()
حالا اگه هنوز شک داره، میتونه با این دستور چک کن که اسپارک JAR رو شناسایی کرده یا نه: امکان داره شناسایی اشتباه شده یا انجام نداده
print(spark.sparkContext.getConf().get("spark.jars"))
اگه مسیر JAR رو نشون داد، ببین یعنی اوکیه و دیگه نباید مشکلی باشه.
در تاریخ چهارشنبه ۵ مارس ۲۰۲۵، ۲:۵۳ بعدازظهر ایمان خورشیدی IMAN KHORSHIDI <iman.khorshidi.alikordi1985@gmail.com> نوشت:
Hi @Tapan_Sharma
Thankyou for your response. there is a package with com.complexible please check with stardog official docs.
After using this class com.stardog.spark.datasource.StardogSource
now I am not getting ClassNotFoundException but I am getting error as:
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
I have run the custom SPARQL query with the same credentials it is working fine but in spark environment it is not working.
please give some input on it how to resolve this exception.
**Note: i am using Python here not java. **
@Jauwad_Mazhari , Apologies for the confusion with respect to package name. Wanted to convey that there is no spark
subpackage within com.complexible
.
Authentication issue shouldn't have come.
Can you please share the logs of spark application and Stardog for investigation?
~Tapan
Hi @Tapan_Sharma
Thanks for your clarity on subpackage I can understand that now.
I am attaching the spark_log.txt file for your reference please check and let me know how to resolve that issue.
Regards
Jauwad
spark_log.txt (21.0 KB)
@Jauwad_Mazhari , Can you please download the spark connector release 3.2.0 from here? This jar is compatible with Spark 3.5.0. You should be able to connect to Stardog.
Please let me know if you face any issue.
Regards
Tapan
Hey @Tapan_Sharma I am using stardog cloud free endpoint does it cause the issue, I have checked with spark connector 3.2.0 as well, still I am getting the same exception
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
please let me know how to fix this exception.
@Jauwad_Mazhari , This should work in Stardog free account however I need to verify it. I will get back to you on this. Meanwhile can you share the Spark and Stardog logs where you are getting this exception as I could not find the concerned AuthenticationException in the logs that you shared earlier?
Thanks!