SparkClassNotFoundException: Unable to Find Stardog Spark Connector in PySpark

I'm trying to integrate Stardog with PySpark using the Stardog Spark Connector. However, when I run my script, I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.complexible.stardog.spark.datasource.

Caused by: java.lang.ClassNotFoundException: com.complexible.stardog.spark.datasource.DefaultSource

My environment:

  • OS: Windows
  • Spark version: 3.5.4
  • Python version: 3.9.7
  • Java version: jdk-11
  • Stardog Spark Connector JAR: stardog-spark-connector-3.1.0.jar
  • Running inside a virtual environment (.venv) in VS Code

Code Snippet:
from pyspark.sql import SparkSession

Initialize SparkSession with Stardog Connector

spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()

print("Spark session created successfully!")

Load algorithm configuration from the properties file

config_file = "PageRankConfigFilePath"

Read from Stardog using Spark

df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()

Show results (if applicable)

df.show()

Stop Spark

spark.stop()

What I Have Tried So Far:

  1. Verified spark-submit works (spark-submit --version)
  2. Checked if the JAR exists at path/to/stardog-spark-connector-3.1.0.jar
  3. Attempted running with --jars option instead of spark.jars in code:

spark-submit --jars path/to/stardog-spark-connector-3.1.0.jar my_spark_job.py

Questions:

  1. Is com.complexible.stardog.spark.datasource the correct format for Stardog Spark Connector in version 3.1.0?
  2. Do I need a different JAR file or an additional dependency for Stardog integration with PySpark?
  3. Any suggestions on debugging or alternative ways to load the Stardog Spark Connector?

Would really appreciate any help! Thanks in advance.

Hi @Jauwad_Mazhari ,

You need to use com.stardog.spark.datasource.StardogSource classs similar to this example.

Also, please note that there is no such package with com.complexible.

~Tapan

داداش اگه spark.jars جواب داده، پس مشکل این بوده که کانکتور رو درست به اسپارک معرفی نکرده بوده. حالا اگه کار می‌کنه، بهتره کد رو به این شکل نهایی کنی که همیشه درست اجرا بشه: ببین به یه بار اجرا دلخوش نکن ، حتمیش کن

from pyspark.sql import SparkSession

spark = SparkSession.builder
.appName("StardogIntegration")
.config("spark.jars", "path/to/stardog-spark-connector-3.1.0.jar")
.getOrCreate()

print("Spark session created successfully!")

config_file = "PageRankConfigFilePath"

df = spark.read
.format("com.complexible.stardog.spark.datasource")
.option("stardog.endpoint", "endpoint")
.option("stardog.database", "spark")
.option("stardog.username", "userName")
.option("stardog.password", "Password")
.option("config", config_file)
.load()

df.show()

spark.stop()

حالا اگه هنوز شک داره، می‌تونه با این دستور چک کن که اسپارک JAR رو شناسایی کرده یا نه: امکان داره شناسایی اشتباه شده یا انجام نداده

print(spark.sparkContext.getConf().get("spark.jars"))

اگه مسیر JAR رو نشون داد، ببین یعنی اوکیه و دیگه نباید مشکلی باشه.

در تاریخ چهارشنبه ۵ مارس ۲۰۲۵،‏ ۲:۵۳ بعدازظهر ایمان خورشیدی IMAN KHORSHIDI <iman.khorshidi.alikordi1985@gmail.com> نوشت:

Hi @Tapan_Sharma
Thankyou for your response. there is a package with com.complexible please check with stardog official docs.
After using this class com.stardog.spark.datasource.StardogSource
now I am not getting ClassNotFoundException but I am getting error as:

com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
I have run the custom SPARQL query with the same credentials it is working fine but in spark environment it is not working.
please give some input on it how to resolve this exception.
**Note: i am using Python here not java. **

@Jauwad_Mazhari , Apologies for the confusion with respect to package name. Wanted to convey that there is no spark subpackage within com.complexible.

Authentication issue shouldn't have come.
Can you please share the logs of spark application and Stardog for investigation?

~Tapan

Hi @Tapan_Sharma
Thanks for your clarity on subpackage I can understand that now.
I am attaching the spark_log.txt file for your reference please check and let me know how to resolve that issue.
Regards
Jauwad
spark_log.txt (21.0 KB)

@Jauwad_Mazhari , Can you please download the spark connector release 3.2.0 from here? This jar is compatible with Spark 3.5.0. You should be able to connect to Stardog.

Please let me know if you face any issue.

Regards
Tapan

Hey @Tapan_Sharma I am using stardog cloud free endpoint does it cause the issue, I have checked with spark connector 3.2.0 as well, still I am getting the same exception
com.complexible.stardog.security.StardogAuthenticationException: Unauthorized
please let me know how to fix this exception.

@Jauwad_Mazhari , This should work in Stardog free account however I need to verify it. I will get back to you on this. Meanwhile can you share the Spark and Stardog logs where you are getting this exception as I could not find the concerned AuthenticationException in the logs that you shared earlier?

Thanks!

Hey @Tapan_Sharma I have check that spark log file which I share earlier is not the correct that is already solved.
Now tapan I am using both the endpoint free cloud endpoint and paid cloud endpoint but there is no change in exception as I tried many things but still not able to solve.
All the credentials which I am passing is correct, manually verified by running SPARQL query it is working fine but in spark it is giving the Unauthorized exception.
I am attaching you the correct spark log file but for stardog log file I am not able to find on stardog cloud and I can't run on localhost due to some reason so for stardog log file please tell me how to get from cloud if it is available.
spark_output.log (86.5 KB)

Hi @Jauwad_Mazhari , Stardog Cloud accounts support sso authentication, hence you will need to follow some extra steps.

You need to generate the auth token from stardog cloud host, as mentioned here:

You need to set stardog.token.authentication option as true and stardog.password needs to be set as base64 encoded auth token that was generated in previous step.

Make these changes in your spark.read statement and that should resolve your issue.
Let me know in case you face any further issues.

Thanks!

Hi @Tapan_Sharma Generated a Basic Auth Token by encoding :` using the following command:
echo -n "username:password" | base64
and tested it on curl via this command:
curl -v -H "Authorization: Basic <base64_token>" "https://sd-.stardog.cloud:5820/admin/databases" which is working properly.

I updated the spark.read statement as well:
df = spark.read
.format("com.stardog.spark.datasource.StardogSource")
.option("stardog.endpoint", "https://sd-.stardog.cloud:5820")
.option("stardog.database", "GraphAnalytics")
.option("stardog.token.authentication", "true")
.option("stardog.password", "<base64_token>")
.option("query", "SELECT * WHERE { ?s ?p ?o }")
.load()
Despite using the correct Basic Auth token, I am still receiving the same "Unauthorized" exception I am attaching the new log file as well for your reference. Basic Auth work via curl, but fails in the Spark job with the same token.

spark_log1.txt (43.3 KB)
Thanks
Jauwad

@Jauwad_Mazhari , did you create the token using the command given on this page?

Command should be something like this:

curl -u username:password https://express.stardog.cloud:5820/admin/token

Thanks!

@Tapan_Sharma I used the above command which you are referring it is giving me an access token after that to verify I called an API request that was also working now you told me to generate Base64 encoded token so for that I used this echo -n "username:password" | base64 command and the token which is given by this command I used in sdardog.password so does that correct?

Please use the below command against your Stardog server with your credentials to fetch the required token.

curl -u username:password https://express.stardog.cloud:5820/admin/token

After that, generate base64 encoded string using the command you are using.

echo -n <token_generated_in_previous_step> | base64**

Thanks!

After using echo -n <token_gen_in_prev_step> | base64
I am getting exception as:
java.lang.IllegalArgumentException: Illegal base64 character a

I am not sure why this would happen. Can you encode it using python or java code?

By using this code:
import base64

username = "your_username"
password = "your_password"

credentials = f"{username}:{password}"

token = base64.b64encode(credentials.encode()).decode()

print(f"Base64 Encoded Token: {token}")

This code is giving the token same as this command was giving:
echo -n "username:password" | base64 please check.
If I use this token it will give unauthorized exception.

by using this code:
import base64

Replace with your JWT or any string

jwt_token = "your_jwt_or_plain_text_token"

Encode the token

base64_token = base64.b64encode(jwt_token.encode("utf-8")).decode("utf-8")

print("Base64 Encoded JWT Token:", base64_token)
still same unauthorized exception

we are re-encoding may be bacause of that error is comming jwt token is already in base64 encoded format
does it work if we re-encode a token which is already in JWT/base64 format.

Hi @Jauwad_Mazhari ,
please confirm you generated the token with the help of below command using your own credentials and stardog server url:

curl -u username:password https://express.stardog.cloud:5820/admin/token

Step 2: Encode the above token.

As I can see in your examples, you are using both encode and decode in the same statement.

Can you please re-visit and confirm the steps?

Thanks
Tapan

Hi @Tapan_Sharma yes I have generated the token from this command only:

curl -u username:password https://express.stardog.cloud:5820/admin/token

After that I am using this command to encode into base64:

echo -n "<your_token>" | base64

after this I am getting this exception:
Caused by: java.lang.IllegalArgumentException: Illegal base64 character a

and using python code:

import base64
jwt_token = "jwt_token"

base64_token = base64.b64encode(jwt_token.encode("utf-8")).decode("utf-8")

  • token.encode("utf-8") — Converts token to bytes.
  • base64.b64encode() — Encodes the byte token to base64.
  • .decode("utf-8") — Converts the base64 output back to a string.
    print("Base64 Encoded JWT Token:", base64_token)

If I use token after gerating by this code it is giving unauthorized exception and if I remove decode then I am getting IllegalArgumentException.