How to bulk load data into stardog using python?


(Fay Wang) #1

I am using python. I want to bulk load data into stardog. Could you advise how to do it? I tried sparqlstore but it’s discontinued and I failed to use it with basic authentication. I am able to use SPARQLWrapper with basic authentication and SELECT data, but couldn’t figure out how to insert data into stardog.

If you can send me some sample code of bulk load or give me some directions, that’ll be great. Thanks!


(zachary.whitley) #2

You can try using a LOAD instead of a SELECT. https://www.w3.org/TR/sparql11-update/#load


(Fay Wang) #3

With below, graph is updated(one record inserted into graph), but it’s not actually loaded into stardog. My question is how can I load data into stardog database? I am trying to make insert work firstly before trouble shooting LOAD…

g=rdflib.Graph()
g.load('http://localhost:5820/VirtualTesting')
print("graph has %s statements." % len(g))
qres = g.update(
    """
PREFIX core: <http://ontologies.com/core#>
INSERT DATA {  
  core:Account2 core:semanticMatch core:_datasetElement_900005 }
""")
print("graph has %s statements." % len(g))

(zachary.whitley) #4

I’m not too familiar with RDFlib but I dont think the g.load() is doin what you think it is. I think load is going to try to load what is returned from the url into the in memory store. (I can’t remember if that url would return the sprawl service description. If it does it might load that. I’m not sure if that’s what you were intending on doing) I’m not sure why the insert isn’t loading although the object url starting with an underscore is a little strange. I’d have to check if that’s valid. ( I think it’s ok in til but might be problematic in rdf/xml)

I think what you’re looking for is sparqlwrapper. https://github.com/RDFLib/sparqlwrapper


(Fay Wang) #5

I have already tried sparqlwrapper. However, it only supports SELECT (works very well), it doesn’t supports UPDATE or INSERT. The only workaround is to use stardog HTTP api, and I am able to insert one triple into database with one api call.

Please let me know if there is any other way with Python to do bulk load. I am looking for a way to bulk load 2000 triples into the stardog database. Or I can create a file to store all the triples, but how to load the file with python or is there a HTTP api to load file?


(zachary.whitley) #6

You can use the SPARQL graph store protocol https://www.w3.org/TR/2013/REC-sparql11-http-rdf-update-20130321/#http-post

Or the Stardog rest api https://stardog.docs.apiary.io/#reference/core-api/executing-an-update-query

See this post for doing inserts with SPARQLWrapper https://stackoverflow.com/questions/14160437/insert-delete-update-query-using-sparqlwrapper#15322461


(Fay Wang) #7

Thank you for the link. I am able to either use SPARQLWrapper or HTTP execute update query to do the insert. Both methods are sending one INSERT command for each triple. Please correct me if I am wrong.

What’s the recommended way (more efficient) to load large triples? Instead of sending INSERT for 2000 times, is there a way to INSERT or LOAD all triples at one time?

For example, HTTP api sends INSERT for one triple:
update_query = ‘’’
INSERT DATA { http://ontologies..com/core/Account4 http://ontologies..com/core/semanticMatch http://ontologies..com/core/_datasetElement_900004 . }
‘’’
Or SPARQLWrapper do INSERT for one tripple:
sparql.setQuery("""
INSERT DATA { http://ontologies..com/core/Account6# http://ontologies..com/core/semanticMatch http://ontologies..com/core/_datasetElement_900006# . }
;
INSERT DATA { http://ontologies..com/core/Account7# http://ontologies..com/core/semanticMatch http://ontologies..com/core/_datasetElement_900007# . }
;
“”")


(stephen) #8

You can always use the HTTP API to add data in a transaction: https://stardog.docs.apiary.io/#reference/transactions/adding-data-to-a-transaction/post


(zachary.whitley) #9

The most efficient way would be to insert the data at database creation time with Stardog bulk loading.

stardog-admin db create -n myDb -- myData.ttl

The INSERT DATA query can be given multiple triples.

INSERT DATA {
http://ontologies..com/core/Account6# http://ontologies.com/core/semanticMatch http://ontologies.com/core/_datasetElement_900006# .
http://ontologies..com/core/Account7# http://ontologies.com/core/semanticMatch http://ontologies.com/core/_datasetElement_900007# . }

Or as @stephen suggested you can use a transaction although I don’t think that would save you all the network roundtrips if you were using the rest api. I think it would if you were using the Java api.


(Ghislain ) #10

If you have a large dataset, it’s better to load it first before using your python code to query.
So, you can use the command like this one:
nohup ./stardog-admin db create -n <myDB> </reop/of/my/large/dataset .
Before you create a stardog.properties file with at least this information:
memory.mode=write_optimized
I used this method to create a DB and load almost 1 billion of triples in less in 3 hours.
HTH


(Fay Wang) #11

Thanks for your suggestion!! One more question, stardog.properties doesn’t exist now. Is it correct to create stardog.properties under stardog-5.3.3/bin, and only add one line (below) in stardog.properties? Do I need to add anything else in the file?

memory.mode=write_optimized


(Jess Balint) #12

Have you considered writing the data to a file and using db create as Ghislain suggests? data add will work equally well for existing databases.

Your stardog.properties file should be in your Stardog home directory; the same place as your data files.

Unless you’re very sensitive to load performance, I wouldn’t worry too much. 3 million triples is not very large and should load fast enough under any memory configuration.


(system) #13

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.