SparkClassNotFoundException: Unable to Find Stardog Spark Connector in PySpark

I'm trying to integrate Stardog with PySpark using the Stardog Spark Connector. However, when I run my script, I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.complexible.stardog.spark.datasource.

Caused by: java.lang.ClassNotFoundException: com.complexible.stardog.spark.datasource.DefaultSource

My environment:

  • OS: Windows
  • Spark version: 3.5.4
  • Python version: 3.9.7
  • Java version: jdk-11
  • Stardog Spark Connector JAR: stardog-spark-connector-3.1.0.jar
  • Running inside a virtual environment (.venv) in VS Code

Code Snippet:
from pyspark.sql import SparkSession

Initialize SparkSession with Stardog Connector

spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()

print("Spark session created successfully!")

Load algorithm configuration from the properties file

config_file = "PageRankConfigFilePath"

Read from Stardog using Spark

df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()

Show results (if applicable)

df.show()

Stop Spark

spark.stop()

What I Have Tried So Far:

  1. Verified spark-submit works (spark-submit --version)
  2. Checked if the JAR exists at path/to/stardog-spark-connector-3.1.0.jar
  3. Attempted running with --jars option instead of spark.jars in code:

spark-submit --jars path/to/stardog-spark-connector-3.1.0.jar my_spark_job.py

Questions:

  1. Is com.complexible.stardog.spark.datasource the correct format for Stardog Spark Connector in version 3.1.0?
  2. Do I need a different JAR file or an additional dependency for Stardog integration with PySpark?
  3. Any suggestions on debugging or alternative ways to load the Stardog Spark Connector?

Would really appreciate any help! Thanks in advance.

Hi @Jauwad_Mazhari ,

You need to use com.stardog.spark.datasource.StardogSource classs similar to this example.

Also, please note that there is no such package with com.complexible.

~Tapan

داداش اگه spark.jars جواب داده، پس مشکل این بوده که کانکتور رو درست به اسپارک معرفی نکرده بوده. حالا اگه کار می‌کنه، بهتره کد رو به این شکل نهایی کنی که همیشه درست اجرا بشه: ببین به یه بار اجرا دلخوش نکن ، حتمیش کن

from pyspark.sql import SparkSession

spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()

print("Spark session created successfully!")

config_file = "PageRankConfigFilePath"

df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()

df.show()

spark.stop()

حالا اگه هنوز شک داره، می‌تونه با این دستور چک کن که اسپارک JAR رو شناسایی کرده یا نه: امکان داره شناسایی اشتباه شده یا انجام نداده

print(spark.sparkContext.getConf().get("spark.jars"))

اگه مسیر JAR رو نشون داد، ببین یعنی اوکیه و دیگه نباید مشکلی باشه.

در تاریخ چهارشنبه ۵ مارس ۲۰۲۵،‏ ۲:۵۳ بعدازظهر ایمان خورشیدی IMAN KHORSHIDI <iman.khorshidi.alikordi1985@gmail.com> نوشت:

Hi @Tapan_Sharma
Thankyou for your response. there is a package with com.complexible please check with stardog official docs.
After using this class com.stardog.spark.datasource.StardogSource
now I am not getting ClassNotFoundException but I am getting error as:

com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
I have run the custom SPARQL query with the same credentials it is working fine but in spark environment it is not working.
please give some input on it how to resolve this exception.
**Note: i am using Python here not java. **

@Jauwad_Mazhari , Apologies for the confusion with respect to package name. Wanted to convey that there is no spark subpackage within com.complexible.

Authentication issue shouldn't have come.
Can you please share the logs of spark application and Stardog for investigation?

~Tapan

Hi @Tapan_Sharma
Thanks for your clarity on subpackage I can understand that now.
I am attaching the spark_log.txt file for your reference please check and let me know how to resolve that issue.
Regards
Jauwad
spark_log.txt (21.0 KB)

@Jauwad_Mazhari , Can you please download the spark connector release 3.2.0 from here? This jar is compatible with Spark 3.5.0. You should be able to connect to Stardog.

Please let me know if you face any issue.

Regards
Tapan

Hey @Tapan_Sharma I am using stardog cloud free endpoint does it cause the issue, I have checked with spark connector 3.2.0 as well, still I am getting the same exception
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
please let me know how to fix this exception.

@Jauwad_Mazhari , This should work in Stardog free account however I need to verify it. I will get back to you on this. Meanwhile can you share the Spark and Stardog logs where you are getting this exception as I could not find the concerned AuthenticationException in the logs that you shared earlier?

Thanks!