How to bulk load data into stardog using python?

Fay · August 7, 2018, 1:44am

I am using python. I want to bulk load data into stardog. Could you advise how to do it? I tried sparqlstore but it’s discontinued and I failed to use it with basic authentication. I am able to use SPARQLWrapper with basic authentication and SELECT data, but couldn’t figure out how to insert data into stardog.

If you can send me some sample code of bulk load or give me some directions, that’ll be great. Thanks!

zachary.whitley · August 7, 2018, 1:00pm

You can try using a LOAD instead of a SELECT. SPARQL 1.1 Update

Fay · August 8, 2018, 12:15am

With below, graph is updated(one record inserted into graph), but it’s not actually loaded into stardog. My question is how can I load data into stardog database? I am trying to make insert work firstly before trouble shooting LOAD…

g=rdflib.Graph()
g.load('http://localhost:5820/VirtualTesting')
print("graph has %s statements." % len(g))
qres = g.update(
    """
PREFIX core: <http://ontologies.com/core#>
INSERT DATA {  
  core:Account2 core:semanticMatch core:_datasetElement_900005 }
""")
print("graph has %s statements." % len(g))

zachary.whitley · August 8, 2018, 1:45am

I’m not too familiar with RDFlib but I dont think the g.load() is doin what you think it is. I think load is going to try to load what is returned from the url into the in memory store. (I can’t remember if that url would return the sprawl service description. If it does it might load that. I’m not sure if that’s what you were intending on doing) I’m not sure why the insert isn’t loading although the object url starting with an underscore is a little strange. I’d have to check if that’s valid. ( I think it’s ok in til but might be problematic in rdf/xml)

I think what you’re looking for is sparqlwrapper. GitHub - RDFLib/sparqlwrapper: A wrapper for a remote SPARQL endpoint

Fay · August 8, 2018, 4:32am

I have already tried sparqlwrapper. However, it only supports SELECT (works very well), it doesn’t supports UPDATE or INSERT. The only workaround is to use stardog HTTP api, and I am able to insert one triple into database with one api call.

Please let me know if there is any other way with Python to do bulk load. I am looking for a way to bulk load 2000 triples into the stardog database. Or I can create a file to store all the triples, but how to load the file with python or is there a HTTP api to load file?

zachary.whitley · August 8, 2018, 10:40am

You can use the SPARQL graph store protocol SPARQL 1.1 Graph Store HTTP Protocol

Or the Stardog rest api https://stardog.docs.apiary.io/#reference/core-api/executing-an-update-query

See this post for doing inserts with SPARQLWrapper python - INSERT/DELETE/UPDATE query using SPARQLWrapper - Stack Overflow

Fay · August 8, 2018, 6:00pm

Thank you for the link. I am able to either use SPARQLWrapper or HTTP execute update query to do the insert. Both methods are sending one INSERT command for each triple. Please correct me if I am wrong.

What's the recommended way (more efficient) to load large triples? Instead of sending INSERT for 2000 times, is there a way to INSERT or LOAD all triples at one time?

stephen · August 8, 2018, 6:19pm

You can always use the HTTP API to add data in a transaction: https://stardog.docs.apiary.io/#reference/transactions/adding-data-to-a-transaction/post

zachary.whitley · August 8, 2018, 6:32pm

The most efficient way would be to insert the data at database creation time with Stardog bulk loading.

stardog-admin db create -n myDb -- myData.ttl

The INSERT DATA query can be given multiple triples.

INSERT DATA {
http://ontologies..com/core/Account6# http://ontologies.com/core/semanticMatch http://ontologies.com/core/_datasetElement_900006# .
http://ontologies..com/core/Account7# http://ontologies.com/core/semanticMatch http://ontologies.com/core/_datasetElement_900007# . }

Or as @stephen suggested you can use a transaction although I don’t think that would save you all the network roundtrips if you were using the rest api. I think it would if you were using the Java api.

gatemezing · August 8, 2018, 7:18pm

If you have a large dataset, it’s better to load it first before using your python code to query.
So, you can use the command like this one:
nohup ./stardog-admin db create -n <myDB> </reop/of/my/large/dataset .
Before you create a stardog.properties file with at least this information:
memory.mode=write_optimized
I used this method to create a DB and load almost 1 billion of triples in less in 3 hours.
HTH

Fay · August 9, 2018, 10:08pm

Thanks for your suggestion!! One more question, stardog.properties doesn’t exist now. Is it correct to create stardog.properties under stardog-5.3.3/bin, and only add one line (below) in stardog.properties? Do I need to add anything else in the file?

memory.mode=write_optimized

jess · August 10, 2018, 2:59am

Have you considered writing the data to a file and using db create as Ghislain suggests? data add will work equally well for existing databases.

Your stardog.properties file should be in your Stardog home directory; the same place as your data files.

Unless you’re very sensitive to load performance, I wouldn’t worry too much. 3 million triples is not very large and should load fast enough under any memory configuration.

system · August 24, 2018, 2:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SPARQL 1.1 Graph Store HTTP Protocol Bug	2	641	September 28, 2017
Can import be called via a REST API? Feature Request	20	1532	August 1, 2019
Bulk upload rdf files Support	4	996	May 11, 2018
Loading Graphs via Studio? Stardog Studio	4	1231	March 25, 2018
Loading data Coming from Kafka Support	3	1104	September 13, 2017

How to bulk load data into stardog using python?

Related topics