MongoDB virtual graph mapping - facing issues while picking only a substring part of a value

I'm trying to write a virtual graph mapping for a mongodb datasource in which the values of the json keys are having different pattern and I want to pick a substring from it as that substring would be used to create an IRI in BIND template. But, I'm facing issue while retrieving the mapped data.

I had attached the sample data demo, expected RDF, the virtual graph mapping file, the SPARQL Query, and the error.

Sample Mongodb json data from "mongo" collection:

{
    "_id": "***************",
	"boxID": "urn:abc:def:id:123456.98765432",  # wanted to pick only 123456.98765432
	"equipmentID": "http://www.equipment.com/id/12349876", # wanted to pick only 12349876
}

Wanted to create a Mapping that represents the data as the below RDF:

:12345698765432 a :Box ; :ID "12345698765432" .
:12349876 a :Equipment ; :ID "12349876" .
:12345698765432 :contains :12349876 .

Here is the mapping syntax which I created:

MAPPING
FROM JSON {
"mongo" : {
  "_id" : "?id:objectId",
  "$substr: ['$boxID', 15, -1]": "?boxID",
  "$substr: ['$equipmentID', 27, -1]": "?equipmentID",
}
}
TO {
  ?boxNumber a :Box ;
                :ID ?boxID .
  ?equipmentNumber a :Equipment ;
             :ID ?equipmentID .
  ?boxNumber :contains ?equipmentNumber .

}        
WHERE {
   BIND (template("{boxID}") AS ?boxNumber)
   BIND (template("{equipmentID}") AS ?equipmentNumber)
}

But, when I run the following SPARQL query to find the list of all Box IDs,

SELECT ?box
Where {
    GRAPH <virtual://mongo_virtual_graph> 
{
?box a :Box .
}
}

I face an Error for MongoDB query translation:

Failed to run query: com.complexible.stardog.plan.eval.operator.OperatorException: While running MongoDB query [{ $match : {"$substr: ['$boxID', 16, -1]" : {$exists : true, $ne : null}} },
{$project: {boxNumber: '$$substr: ['$boxID', 16, -1]'} }]

Here in the Error we can see that the MongoDB query for Project part is weird because it's having 2times $ before substr function $$substr.

Need help in how to resolve this mongodb use-case.

Thanks!

Hi Ashish,

Welcome to the community forum. The FROM clause does not support functions. Normally, we would put the function in the WHERE clause but we do not support substring there either. At the moment, the closest you could come to achieving what you're after would be to map those fields, boxID and equipmentID as strings. Something like:

MAPPING
FROM JSON {
  "mongo" : {
    "_id" : "?id:objectId",
    "boxID" : "?boxID",
    "equipmentID" : "?equipmentID",
  }
}
TO {
  ?thing a :Thing ;
      :boxId ?boxID ;
      :contains ?equipmentID .
}        
WHERE {
   BIND (template("thing:{id}") AS ?thing)
}

and then extract the string in SPARQL:

SELECT * {
    GRAPH <virtual://vgname> {
        ?thing a :Thing ;
            :boxId ?boxID ;
            :contains ?equipmentID .
        BIND(REPLACE(?boxID, ".*:", "") AS ?boxNum)
        BIND(REPLACE(?equipmentID, ".*/", "") AS ?equipmentNum)
    }
}

The problem with this, however, is we currently do not translate the REPLACE function to a MongoDB function, so this query is not going to scale well.

-Paul

Hello Paul,

Thanks for the reply with the solution,
but the approach you shared would create a generic Thing Class and I won't be able to use the Box and Component class separately and that would be a challenge for us.

TO {
  ?thing a :Thing ;
      :boxId ?boxID ;
      :contains ?equipmentID .
}        

Also, the BIND template here uses the MongoDB ID only which can't be used to link with other individuals as this is the unique key for every Document auto-generated by MongoDB.

WHERE {
   BIND (template("thing:{id}") AS ?thing)
}

That's why our solution wanted to use the BoxID and EquipmentID in the BIND template as they act as foreign keys in the Document and can be used to link one Document with others. And those keys are having a different pattern of value from which we only wanted to pick a substring of it as mentioned previously, and there we wanted help in resolving that use-case.

{
    "_id": "***************",
	"boxID": "urn:abc:def:id:123456.98765432",  # wanted to pick only 123456.98765432
	"equipmentID": "http://www.equipment.com/id/12349876", # wanted to pick only 12349876
}

I hope things are a bit clear now and looking forward to if any solutions are available to resolve this challenge.

Thanks!
-Ashish