Stardog and corona

I'm not a huge fan of when companies jump on a crisis or when they swing the IT hammer to solve any and all problems but I'm going to suggest this because I don't know what else to do. Is there any chance that Stardog could open the sandbox to covid-19 datasets and maybe start a github repo for mappings? There seems to be lots of data out there https://github.com/search?q=covid&type= I thought it might be something to do while we're all stuck inside.

2 Likes

Interestingly I haven't been able to find much raw data and apps but it's difficult to tell where they're pulling their information from. Due to the global nature of this virus the multilingual support in rdf might be helpful with these datasets. This seems to be the best resource I've found yet. https://github.com/soroushchehresa/awesome-coronavirus#statistics-and-data

Here's a nice dataset for Korean cases that highlight the multilingual aspects of it. Maybe we can get translations for Italian. https://github.com/jihoo-kim/Coronavirus-Dataset

I've started a github page at https://github.com/semantalytics/coronavirus

The data set I posted has lat/long, geographic region information, which we can pull in geonames to get population density, and contact chaining although I haven't had a chance to see how that's done yet.

I did a really quick mapping just to get the contact chaining. You can see the now infamous patient 31 on the right.

Zach,

Incidentally last night I had the same intent - to put covid-19 csv data into stardog and then visualize it in Linkurious to learn more about it spread. How many features can be gathered from data?

Hey Zach,

Are you planning on publishing the mappings to that github repo you created? I also have an interest in assembling a knowledge graph here.

Cheers,
Al

1 Like

I was planning on it but I'd be happy to have Stardog take the lead and submit PR's to your repo. I also thought it would be a good exercise in taking notes on the pain points for rapidly producing mappings.

One thing that I struggle with, especially with CSV files, is if I should just quickly import the file and fix it up after with SPARQL update queries or to load it into a relational database where I have more control over the mappings. If it's even a moderately sized data set I go the relational database route since it's really painful to reimport every time you update the mappings.

1 Like

Hi Zach,

My 2c - previously I have solved similar complex mapping challenges by using Python to transform the source into JSON-LD and load it into Stardog. My source was plain enterprise-attack.json - a simple JSON but the same challenge exists when dealing with CSVs ... I suppose.

Be glad to provide guidance with a Jupyter notebook to get this custom mapping and loading.

Hope this helps,
Regards

I found a great general resource for contributing to covid response efforts https://helpwithcovid.com/

I posted initial mappings of the korean dataset to https://github.com/semantalytics/stardog-covid-19-south-korea-kcdc

Where can one find a description of the fields in those files?

The original data set is https://www.kaggle.com/kimjihoo/coronavirusdataset with some short descriptions. It's hard to tell exactly where all the data is coming from.People are copying it and adding additional stuff. This data set appears to use it but appears to have some additional data https://github.com/parksw3/COVID19-Korea

This might be a good data set to add. It's the location of all US hospitals. https://catalog.data.gov/dataset/hospitals-dcdfc . There's also this one from HIFLD, although I'm not sure if it's any different https://hifld-geoplatform.opendata.arcgis.com/datasets/6ac5e325468c4cb9b905f1728d6fbf0f_0/data

This is a global thing so I'll see if I can find hospital locations for other countries. This was just and easy one to find.

I found this https://healthsites.io/map

...and this this morning https://pages.semanticscholar.org/coronavirus-research . It's mostly published papers. Might be some interesting stuff in the metadata. Maybe use NLP/BITES, etc and there are some good links to other sources of data at the bottom.

Italian data set. Will need to use google translate https://github.com/pcm-dpc/COVID-19

Tableau started a COVID-19 data hub - https://www.tableau.com/covid-19-coronavirus-data-resources

might have additional data sources

They just released a new 2.0 dataset https://www.kaggle.com/kimjihoo/coronavirusdataset

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.