Creating virtual graph when CSV contains legitimate URIs

(Mark Miller) #1

I have a CSV file with a few columns containing fully-qualified URIs for entities that are already in my graph. I have been able to import a virtual graph, but it seems like the URI columns are being treated as if they contain plain strings.

I'm using the first generation SMS. I could theoretically strip out "http.*/" form each URI, but I don't think I can do a mapping like below because the URIs in a given column don't all come from the same namespace.

prefix:{"somekey"} a prefix:Entity ;
    prefix:term staticns:{termfragment};
    sm:map [
      sm:table "csv_with_uris" ;
    ] .
(Paul Jackson) #2

Hi Mark,

Welcome to the community. It seems like your idea should work given a mappings file like:

{somekey} a prefix:Entity ;
    prefix:term staticns:{termfragment} .

or

<{somekey}> a prefix:Entity ;
    prefix:term staticns:{termfragment} .

but it's running into problems. I created an issue for that.

The work-around that I was able to find was to import the CSV into an RDBMS (I used Derby - comes with Java), and use R2RML for the mappings:

ij> CONNECT 'jdbc:derby://localhost:1527/iridb;create=true';
ij> create table "iritable" ( "somekey" varchar(50), "termfragment" varchar(50));
0 rows inserted/updated/deleted
ij> INSERT INTO "iritable" ("somekey","termfragment") VALUES ('http://stardog.com/example/pjacks', 'Paul Jackson');
1 row inserted/updated/deleted

mappings:

@prefix prefix: <http://example.com/prefix/> .
@prefix staticns: <http://example.com/staticns/> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

[] a rr:TriplesMap;
  rr:logicalTable [ rr:tableName "\"iritable\"" ] ;
  rr:subjectMap [ rr:column "\"somekey\"" ;
                  rr:termType rr:IRI ;
                  rr:class prefix:Entity ] ;
  rr:predicateObjectMap [ 
    rr:predicate prefix:term ;
    rr:objectMap [ rr:template "staticns:{\"termfragment\"}" ]
  ] .

properties file:

jdbc.url=jdbc:derby://localhost:1527/iridb;create=true
jdbc.driver=org.apache.derby.jdbc.ClientDriver
jdbc.username=APP
# ignored
jdbc.password=APP

command-line:

stardog-admin virtual import -f r2rml mydb mydb.properties mycsv3.ttl

Hope that helps.
-Paul

(Mark Miller) #3

Thanks, Paul. I'll have some time to revisit this later this week. Maybe I made a mistake or missed a feature... do you know if the SMS is documented thoroughly anywhere?

Using one of the other syntaxes or staging the data in a RDBMS aren't off the table, either.

-Mark

(Paul Jackson) #4

Hi Mark,

The Stardog Mapping Syntax (SMS) is fairly simple. It's just turtle with templates. The documentation is here. Let us know if you notice something missing.

Thanks,
-Paul

(zachary.whitley) #5

...or if you're super curious or looking for a more formal description you can always look at the parser in the stardog-union millan repo that powers the stardog studio language server protocol at https://github.com/stardog-union/millan/tree/master/src/sms

(Mark Miller) #6

Here's a complete worked example of what I'm observing. Does it clarify whether I'm doing something wrong, or the importer isn't behaving as expected?

mixedupdrugs_noquote.csv

serial,label,term
1,acetaminophen,http://purl.bioontology.org/ontology/DRUGBANK/DB00316
2,morphine,http://purl.bioontology.org/ontology/RXNORM/7052
3,analgesic,http://purl.obolibrary.org/obo/CHEBI_35480

mixedupdrugs_noquote.ttl

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sm: <tag:stardog:api:mapping:> .
@prefix mydata: <http://example.com/resource/> .

mydata:{serial} a mydata:Row ;
        mydata:label "{label}" ;
        mydata:term <{term}> ;

        sm:map [
                sm:table "mixedupdrugs_noquote.csv" ;
        ] .

insert command

stardog-6.1.2/bin/stardog-admin virtual import mixedupdrugs_noquote mixedupdrugs_noquote.ttl mixedupdrugs_noquote.csv

export command

stardog data export -f PRETTY_TURTLE -s -v mixedupdrugs_noquote
Exported 9 statements from mixedupdrugs_noquote to /home/ubuntu/.exports/mixedupdrugs_noquote-2019-04-21.ttl in 1.451 ms

mixedupdrugs_noquote-2019-04-21.ttl

@prefix : <http://api.stardog.com/> .
@prefix stardog: <tag:stardog:api:> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.com/resource/2> a <http://example.com/resource/Row> ;
    <http://example.com/resource/label> "morphine" ;
    <http://example.com/resource/term> <urn:/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FRXNORM%2F7052> .

<http://example.com/resource/1> a <http://example.com/resource/Row> ;
    <http://example.com/resource/label> "acetaminophen" ;
    <http://example.com/resource/term> <urn:/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FDRUGBANK%2FDB00316> .

<http://example.com/resource/3> a <http://example.com/resource/Row> ;
    <http://example.com/resource/label> "analgesic" ;
    <http://example.com/resource/term> <urn:/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_35480> .

I expected this instead (without URL encoding of the term)

<http://example.com/resource/3> a <http://example.com/resource/Row> ;
    <http://example.com/resource/label> "analgesic" ;
    <http://example.com/resource/term> <http://purl.obolibrary.org/obo/CHEBI_35480> .
(Lorenz B.) #7

I can confirm that using <{term}> always creates IRIs of form
<urn:/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_35480>
and it doesn't mater whether it is in subject or object position.

The even smaller example mapping shows this behaviour:

<{term}> a <{term}> ;
       
        sm:map [
                sm:table "mixedupdrugs_noquote.csv" ;
        ] .
1 Like
(stephen) #8

Hi All,

We have confirmed that this is an issue with our client-side CSV parsing and are working on a fix.

In the meantime, there are a couple of options to work around this:

  1. Import the CSV into an RDBMS and then use the mappings to virtual import from that table. This work is performed server-side where this issue does not exist.
  2. Use Excel formulas in the CSV to generate valid RDF and add it manually
1 Like
(system) closed #9

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.