Sample code for calling admin/virtual_graphs/import

mike_res · August 16, 2023, 5:35am

I was experimenting with various ETL methods, and whenI went to tinker with POST admin/virtual_graphs/import as an option, I was unable to find any documentation.

Ironically, while typing this post, the "topic similarity" window did turn up a post that I would have liked to have found sooner

In any case, here is a simple Typescript program that I wrote to test my understanding of the endpoint:

import {Axios} from 'axios'
import FormData from 'form-data' 
const fs = require('fs');

// load the JSON that we want to ingest into a string variable
// alternatively, it could be a stream .. 
const movies_sms = fs.readFileSync('/private/var/opt/stardog/movies.sms');

// load the SMS mapping that we want to use as a string
// alternatively, it could be a stream .. 
const movies_json = fs.readFileSync('/private/var/opt/stardog/movies.json');

// Build up the form
const data = new FormData();
data.append('database', 'etl_local');
data.append('mappings', movies_sms)
data.append('named_graph', 'urn:ingest_movie_using_axios');
data.append('input_file_type', 'JSON');
data.append('input_file', movies_json);

// build up a configuration for the Axios request that we'll be making
const config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'http://localhost:5820/admin/virtual_graphs/import',
  headers: { 
    'Authorization': 'Basic bm90X215X3VzZXJfbmFtZTpub3RfbXlfcGFzc3dvcmQ=', 
    ...data.getHeaders()
  },
  data : data
};

// create a new Axios object, 
// and then use it to post to /admin/virtual_graphs/import
const axios = new Axios({});
axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

StevePlace · August 16, 2023, 2:49pm

Hi Mike,

I was unable to find documentation on this as well, but fortunately we have some smart people at Stardog who pointed it out to me. You can find admin/virtual_graphs/import here.

Does this give you what you need?

Best,
Steve

mike_res · August 16, 2023, 3:52pm

Not exactly .. which is why I published an example.

The way that I figured out how to use the endpoint was:

I downloaded the Open API metadata: https://stardog-union.github.io/http-docs/openapi.json
I modified it to accept security credentials, as described here: Creating Swagger UI Wrapper for API Access to Stardog Cloud Server
When I went to call the function, I (eventually) noticed that it used a "multipart/form-data" MIME type (first image)
So I used POSTMAN to set up the call (2nd image)
Once that succeeded, I used POSTMAN's code-generation capabilities to generate a Node script .. and then adapted that to the Typescript example above.

In the end it was simple, but it took a few guesses and false starts to stumble my way to success.

Since this seems like a particularly handy ETL method, I figured it was worth publishing a simple example to save fellow travelers some time.

Our use case involves ingesting event records from Veeva Vault QMS, so this method seems almost perfect for building a "Quality Event KG" on an event driven basis .. i.e. our agent subscribes to the event system, and stuffs mapped records into the KG upon receipt .. basically a "one-hop" ingestion pipeline with declarative transformation of content (via SMS) .. pretty much as elegant as it gets.

albaker · August 17, 2023, 1:14pm

We put together our spark job for materialization of virtual graphs to provide this one stop shopping for ingestion of data and using the SMS language as the language of ETL. While the external compute docs [1] highlight the ease of use feature of integration directly with Databricks or Amazon EMR, it does work on a vanilla apache spark distribution, even on a single developer workstation, submitting the spark job. There are docs at the end there on how to programmatically set that up.

The nice thing about the spark job is that it will do the materialization in parallel, using a batched, streaming, compressed method (think transducer) for efficiently loading data into Stardog. So the translation to RDF happens complete off-board from the Stardog server, and we get to take advantage of Stardog's MVCC transaction to use multiple streaming writers to materialize the data into the server. If a simple "virtual import" for a billion triples was taking an hour or so, the spark job should finish that in 15 minutes. For sub-billion datasets, it should be super fast.

mike_res · August 17, 2023, 4:16pm

Thanks! This is super-interesting .. and I think that this will raise some eyebrows .. one reason being that it seems like it might be a nice complement for pipelines built using Nextflow.

albaker · August 17, 2023, 6:14pm

Yes, our intent is to use this for pipeline building in whatever ETL or workflow environment you may have. While clearly it was built with leveraging Spark (it can do the spark partitioning in addition to the parallelism to Stardog), the customer base uses this in things like Apache Airflow, DevOps hooks, and other custom pipeline or workflow configurations.

I think you're already speaking with Tim, we're happy to have more conversations on this topic.

Topic		Replies	Views
Load json into a materialized graph via HTTP POST - possible? Feature Request	17	1718	February 11, 2020
Inputting JSON data programmatically via virtual graphs Support	2	485	April 20, 2020
CSV import via REST interface Feature Request	0	469	September 3, 2020
How to Import CSV with SMS2 using HTTP POST call? Support	4	746	May 22, 2020
Virtual graph - sms syntax - strange behavior with a "https" prefix Bug	4	843	March 28, 2017

Sample code for calling admin/virtual_graphs/import

Related topics