Sample code for calling admin/virtual_graphs/import

I was experimenting with various ETL methods, and whenI went to tinker with POST admin/virtual_graphs/import as an option, I was unable to find any documentation.

Ironically, while typing this post, the "topic similarity" window did turn up a post that I would have liked to have found sooner :wink:

In any case, here is a simple Typescript program that I wrote to test my understanding of the endpoint:

import {Axios} from 'axios'
import FormData from 'form-data' 
const fs = require('fs');

// load the JSON that we want to ingest into a string variable
// alternatively, it could be a stream .. 
const movies_sms = fs.readFileSync('/private/var/opt/stardog/movies.sms');

// load the SMS mapping that we want to use as a string
// alternatively, it could be a stream .. 
const movies_json = fs.readFileSync('/private/var/opt/stardog/movies.json');

// Build up the form
const data = new FormData();
data.append('database', 'etl_local');
data.append('mappings', movies_sms)
data.append('named_graph', 'urn:ingest_movie_using_axios');
data.append('input_file_type', 'JSON');
data.append('input_file', movies_json);

// build up a configuration for the Axios request that we'll be making
const config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'http://localhost:5820/admin/virtual_graphs/import',
  headers: { 
    'Authorization': 'Basic bm90X215X3VzZXJfbmFtZTpub3RfbXlfcGFzc3dvcmQ=', 
    ...data.getHeaders()
  },
  data : data
};

// create a new Axios object, 
// and then use it to post to /admin/virtual_graphs/import
const axios = new Axios({});
axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Hi Mike,

I was unable to find documentation on this as well, but fortunately we have some smart people at Stardog who pointed it out to me. You can find admin/virtual_graphs/import here.

Does this give you what you need?

Best,
Steve

Not exactly .. which is why I published an example.

The way that I figured out how to use the endpoint was:

In the end it was simple, but it took a few guesses and false starts to stumble my way to success.

Since this seems like a particularly handy ETL method, I figured it was worth publishing a simple example to save fellow travelers some time.

Our use case involves ingesting event records from Veeva Vault QMS, so this method seems almost perfect for building a "Quality Event KG" on an event driven basis .. i.e. our agent subscribes to the event system, and stuffs mapped records into the KG upon receipt .. basically a "one-hop" ingestion pipeline with declarative transformation of content (via SMS) .. pretty much as elegant as it gets.


We put together our spark job for materialization of virtual graphs to provide this one stop shopping for ingestion of data and using the SMS language as the language of ETL. While the external compute docs [1] highlight the ease of use feature of integration directly with Databricks or Amazon EMR, it does work on a vanilla apache spark distribution, even on a single developer workstation, submitting the spark job. There are docs at the end there on how to programmatically set that up.

The nice thing about the spark job is that it will do the materialization in parallel, using a batched, streaming, compressed method (think transducer) for efficiently loading data into Stardog. So the translation to RDF happens complete off-board from the Stardog server, and we get to take advantage of Stardog's MVCC transaction to use multiple streaming writers to materialize the data into the server. If a simple "virtual import" for a billion triples was taking an hour or so, the spark job should finish that in 15 minutes. For sub-billion datasets, it should be super fast.

Thanks! This is super-interesting .. and I think that this will raise some eyebrows .. one reason being that it seems like it might be a nice complement for pipelines built using Nextflow.

Yes, our intent is to use this for pipeline building in whatever ETL or workflow environment you may have. While clearly it was built with leveraging Spark (it can do the spark partitioning in addition to the parallelism to Stardog), the customer base uses this in things like Apache Airflow, DevOps hooks, and other custom pipeline or workflow configurations.

I think you're already speaking with Tim, we're happy to have more conversations on this topic.

1 Like