Mysterious Error in JSON Import

Hi All,

I'm presently importing JSON records into a blank database, following Stardog docs.

I created my own SMS2 mapping, please see the test record and mapping below.

I checked stardog.log and starrocks.log and no sign of error details.

Here's the error writren in the terminal.

Thanks for your help in analyzing my error.

Best,

Stéphane

=====================================================================

Error importing file 'test4.json'. Encountered " "." ". "" at line 0, column 0.
Was expecting one of:
"(" ...
"{" ...
"[" ...
"<<" ...
...
...
"true" ...
"false" ...
<Q_IRI_REF> ...
<PNAME_NS> ...
<PNAME_LN> ...
<BLANK_NODE_LABEL> ...
...
...
...
<INTEGER_POSITIVE> ...
<INTEGER_NEGATIVE> ...
...
<DECIMAL_POSITIVE> ...
<DECIMAL_NEGATIVE> ...
...
<DOUBLE_POSITIVE> ...
<DOUBLE_NEGATIVE> ...
<STRING_LITERAL1> ...
<STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ...
<STRING_LITERAL_LONG2> ...

=====================================================================

{
"id": "2uy3g24ku3y42gku3y42g3",
"title": "This is a title",
"paperAbstract": "",
"authors": [
{
"name": "B Bougie",
"ids": [
"23423424"
]
},
{
"name": "W Parker",
"ids": [
"34534534"
]
}
],
"inCitations": [],
"outCitations": [],
"year": 2019,
"s2Url": "",
"sources": [],
"pdfUrls": [],
"venue": "",
"journalName": "",
"journalVolume": "19",
"journalPages": "45-54",
"doi": "",
"doiUrl": "",
"pmid": "",
"fieldsOfStudy": [
"Computer Science"
],
"magId": "3000193832",
"s2PdfUrl": "",
"entities": []
}

=====================================================================

PREFIX : https://localhost/
MAPPING urn:articles
FROM JSON {
"id": "?id",
"title": "?title",
"paperAbstract": "?paperAbstract",
"authors": [ "?authors", {"name": "?name","ids": ["?ids"]},],
"inCitations": [?inCitations],
"outCitations": [?outCitations],
"year": "?year",
"s2Url": "?s2Url",
"sources": [?sources],
"pdfUrls": [?pdfUrls],
"venue": "?venue",
"journalName": "?journalName",
"journalVolume": "?journalVolume",
"journalPages": "?journalPages",
"doi": "?doi",
"doiUrl": "?doiUrl",
"pmid": "?pmid",
"fieldsOfStudy": ["?fieldsOfStudy"],
"magId": "?magId",
"s2PdfUrl": "?s2PdfUrl",
"entities": [?entities]
}
TO {
?article a :Article ;
:id ?id;
:idIndex ?idIndex;
:title ?title;
:paperAbstract ?paperAbstract;
:authoredBy ?authoredBy;
:papers ?papers;
:inCitations ?inCitations;
:outCitations ?outCitations;
:year ?year;
:s2Url ?s2Url;
:sources ?sources;
:pdfUrls ?pdfUrls;
:venueAt ?venueAt;
:journalIn ?journalIn;
:journalInVol ?journalInVol;
:journalPages ?journalPages;
:doi ?doi;
:doiUrl ?doiUrl;
:pmid ?pmid;
:fieldsOfStudyIn ?fieldsOfStudyIn;
:magId ?magId;
:s2PdfUrl ?s2PdfUrl;
:entities ?entities.

?idIndex a :IdIndex ;
:index ?id .

?authoredBy a :AuthoredBy;
:name ?name;
:ids ?ids.

?papers a :papers;
:ids ?ids;
:id ?id.

?ids a :Person .

?fieldsOfStudyIn a :FieldsOfStudyIn;
:fieldsOfStudy.

?venueAt a :VenueAt;
:venue ?venue.

?journalIn a :JournalIn;
:journalName ?journalName.

?journalInVol a :JournalInVol;
:journalName ?journalName;
:journalVolume ?journalVolume.
}
WHERE {
BIND (xsd:date(?year) AS ?xsdYear)
BIND (template("localhost/articles/article_{id}") AS ?article)
BIND (template("localhost/articles/author_{ids}") AS ?authoredBy)
BIND (template("localhost/articles/author_article_{ids}{id}") AS ?papers)
BIND (template("localhost/articles/venue
{venue}") AS ?venueAt)
BIND (template("localhost/articles/journal_{journalName}") AS ?journalIn)
BIND (template("localhost/articles/volume_{journalName}{journalVol}") AS ?journalInVol)
BIND (template("localhost/articles/field
{fieldsOfStudy}") AS ?fieldsOfStudyIn)
}

=====================================================================

Can you share the command you ran?

Yes sorry for forgetting this, here's the command:

sudo stardog-admin virtual import mydb --format sms2 mapping.sms test4.json

Ok, I see what's happening. I was confused by the poor error message. The contents of the block FROM JSON { ... } should be a valid JSON object. There are two reasons this is not true in your mappings. The first is that the contents of the block begins with "id". You need to enclose the whole thing in {} to indicate that it's a JSON object. The reason this is required is that it can also be an array. The mapping should look something like:

FROM JSON { // beginning of mapping source
 { // beginning of JSON object
    "id" : "?id",
   // rest of JSON template
  } // end of JSON object
} // end of mapping source
TO {
  # Mapping target
}
...

Additionally, you need to quote the array elements, eg. "pdfUrls": ["?pdfUrls"],. Hope this helps.

Jess

Thanks Jess, unfortunately after applying the fixes, the error remains exactly the same.

Please see below the modified mapping.

PREFIX :
MAPPING urn:articles
FROM JSON {
{
"id": "?id",
"title": "?title",
"paperAbstract": "?paperAbstract",
"authors": [ "?authors", {"name": "?name","ids": ["?ids"]},],
"inCitations": ["?inCitations"],
"outCitations": ["?outCitations"],
"year": "?year",
"s2Url": "?s2Url",
"sources": ["?sources"],
"pdfUrls": ["?pdfUrls"],
"venue": "?venue",
"journalName": "?journalName",
"journalVolume": "?journalVolume",
"journalPages": "?journalPages",
"doi": "?doi",
"doiUrl": "?doiUrl",
"pmid": "?pmid",
"fieldsOfStudy": ["?fieldsOfStudy"],
"magId": "?magId",
"s2PdfUrl": "?s2PdfUrl",
"entities": ["?entities"]
}
}
TO {
?article a :Article ;
:id ?id;
:idIndex ?idIndex;
:title ?title;
:paperAbstract ?paperAbstract;
:authoredBy ?authoredBy;
:papers ?papers;
:inCitations ?inCitations;
:outCitations ?outCitations;
:year ?year;
:s2Url ?s2Url;
:sources ?sources;
:pdfUrls ?pdfUrls;
:venueAt ?venueAt;
:journalIn ?journalIn;
:journalInVol ?journalInVol;
:journalPages ?journalPages;
:doi ?doi;
:doiUrl ?doiUrl;
:pmid ?pmid;
:fieldsOfStudyIn ?fieldsOfStudyIn;
:magId ?magId;
:s2PdfUrl ?s2PdfUrl;
:entities ?entities.

?idIndex a :IdIndex ;
:index ?id .

?authoredBy a :AuthoredBy;
:name ?name;
:ids ?ids.

?papers a :papers;
:ids ?ids;
:id ?id.

?ids a :Person .

?fieldsOfStudyIn a :FieldsOfStudyIn;
:fieldsOfStudy.

?venueAt a :VenueAt;
:venue ?venue.

?journalIn a :JournalIn;
:journalName ?journalName.

?journalInVol a :JournalInVol;
:journalName ?journalName;
:journalVolume ?journalVolume.
}
WHERE {
BIND (xsd:date(?year) AS ?xsdYear)
BIND (template("localhost/articles/article_{id}") AS ?article)
BIND (template("localhost/articles/author_{ids}") AS ?authoredBy)
BIND (template("localhost/articles/author_article_{ids}{id}") AS ?papers)
BIND (template("localhost/articles/venue
{venue}") AS ?venueAt)
BIND (template("localhost/articles/journal_{journalName}") AS ?journalIn)
BIND (template("localhost/articles/volume_{journalName}{journalVol}") AS ?journalInVol)
BIND (template("localhost/articles/field
{fieldsOfStudy}") AS ?fieldsOfStudyIn)
}

Is the first line of your mappings file missing the prefix uri? eg. PREFIX :

No not missing, it reads like the example:

PREFIX : <http://example.com/>

Remove "urn:Article" after MAPPING.

Also you have two other issues

"authors": [ "?authors", {"name": "?name","ids": ["?ids"]},],

The array in the mapping must only have 1 elements. In other word all element in the array must confirm to the structure given.

Also I believe one of your BIND must be adjusted from
BIND (template("localhost/articles/volume_{journalName}{journalVol}") AS ?journalInVol)
to
BIND (template("localhost/articles/volume_{journalName}{journalVolume}") AS ?journalInVol)

Hi, actually while there was an error with the MAPPING line, it was not that line that cause the error you mentioned. In your TO section you have an invalid triple definition

?ids a :Person .

?fieldsOfStudyIn a :FieldsOfStudyIn ;
   :fieldsOfStudy .      #####Missing value 

?venueAt a :VenueAt;
   :venue ?venue.

Granted the error was not very informative. May I recommended using a more iterative approach when building SMS2 files. Essentially start with

PREFIX : <https://localhost/>
MAPPING :articles
FROM JSON {
   {
      "id": "?id",
   }
}
TO {
   ?article a :Article ;
     :id ?id;
}
WHERE {
   BIND (template("http://localhost/articles/article_{id}") AS ?article)
}

And then keep adding to it.

Merci beaucoup Serge this is most appreciated. Your advice allowed me to resolve most issues.

My remaining problem is how to map the multidimensional arrays that is the list of authors, please see below my JSON.

I tried to follow the "actedInMovie" example from the docs:

FROM

    "actor":[ {
        "actor":"?actorId",
        "name":"?actorName"
      }
    ]

TO

  ?actedInMovie a :ActedInMovie ;
    :actor ?actor ;
    :name ?actorName .

  ?actor a :Person .

And this mapping doesn't break anything, except that authors are not imported, hence 2 classes are not created nor filled:

FROM:

"authors": [ {"name": "?name","ids": "?ids"}],

TO:

:authoredBy ?authoredBy;

and the class:

?authoredBy a :AuthoredBy;
:name ?name;
:ids ?ids.

and the template:

BIND (template("https://localhost/articles/author_{ids}") AS ?authoredBy)

Same problem with papers, combination of author ids and article id:

FROM

"id": "?id",
"authors": [ {"name": "?name","ids": "?ids"}],

TO

:papers ?papers;

And the class:

?papers a :Papers;
:ids ?ids;
:id ?id.

And the template:

BIND (template("https://localhost/articles/author_article_{ids}_{id}") AS ?papers)

Thanks again and hoping I may find a way to help in return!

Best,

Stéphane

Sometime understanding the data help answering these type of question. Here a part I do not fully understand.

"authors": [{
   "name": "B Bougie",
   "ids": [
      "23423424"
   ]},{
    "name": "W Parker",
    "ids": [
       "34534534"
   ]}
]

What is ids. Since it an array I first assume it was a list of ids of article written by this author, since an author should in principle only have one ?authorId. But when I look at your mapping, you are using it's as the latter?

Can you confirm that it is indeed the author id, and that they can have more than 1.

Bonjour Serge - Here are a few details on this, actually it is the reverse, each JSON record is 1 article, for which we have 1 to many authors:

  1. id = article unique index key

  2. ids = each author unique index key

  3. authors = array of several authors

So to search for "all papers by 1 author" I created a mapping called "papers" that matches each
"id and ids", hence making it possible to search all the "id" (articles) that match an "ids" (author).

Thanks again for all your help, truly appreciated to learn iteratively, also finding SMS2 an easy syntax apart from the present issue with multidim. arrays.

Best,

Stéphane

In that case the json should be something like

"authors": [
   {
      "name": "B Bougie",
      "id": "23423424" 
   },
   {
       "name": "W Parker",
       "id": "34534534"
   }
]

which represent a list of authors, where each element contain the author name. In essence, if you wanted a list of author ids

ids = authors.map((item) => { item.id})

Let assume we have the following JSON

{
  "id": "2uy3g24ku3y42gku3y42g3",
  "title": "This is a title",
  "authors": [
    {
      "name": "B Bougie",
      "id": "23423424"
    },
    {
      "name": "W Parker",
      "id": "34534534"
    }
  ]
}

with the following mapping

PREFIX : <https://localhost/>
MAPPING :article
FROM JSON {
   {
      "id": "?articleId",
      "title": "?title",
      "authors": [ { "name": "?authorName", "id": "?authorId"} ]
   }
}
TO {
   ?articleIRI a :Article ;
     :id ?articleId;
     :title ?title ;
     :authoredBy ?authorIri .

   ?authorIRI a :Person;
        a :Author;
        :name ?authorName;
        :id ?authorId .
}
WHERE {
   BIND (template("https://localhost/articles/article_{articleId}") AS ?articleIRI)
   BIND (template("localhost/person/person_{authorId}") AS ?authorIRI)
}

Then after it's loaded, if you want you authors that a article

PREFIX : https://localhost/

select * { 
    
    BIND("This is a title" as ?articleTitle)

    _:s :title ?articleTitle . 
    _:s :authoredBy _:authoredBy .
    
    _:authoredBy :id ?authorId;
 } 

The result would be

Furthermore, since an Author is always a Person, you could use a RDFS :Author rdfs:isSubClass :Person rule to infer it rather than to load it.

Merci beaucoup Serge! This is working perfectly!

Hoping to help the same way with others as I progress!

Bonjour Serge and Team.

Sorry to bother you guys again, but here's another mysterious error.

I'm presently executing the JSON import. I have 112M+ JSON records split across 6003 files. I use a parallel command to execute the 6003 imports across my 24 dual Xeon threads. RAM is 256G, max heap 80, max mem 160, swap avail is 260G.

FYI here's my bash command:

find /mnt/data/db/ -name "*json" | parallel sudo stardog-admin virtual import db --format sms2 /home/me/mapping.sms "{}" &

As I let it work all night, it is presently at 28M+ records, after 10 hours of work. I have so far 946 of 6003 files processed, indicated by number of times the terminal reads "Import completed successfully" (just did a wc -l command to get 946).

What is most bizarre: the stardog.log and starrocks.log show that the stardog server is shutting down every 10 second and restarting. Yet my records keep being added but very slowly.

I'm considering perhaps that stardog is not the best approach for me to handle this dataset, that triples won't add much value to my search, and I'd be better off separating this JSON dataset into another db, keeping stardog strictly for traditional ntriples.

Please let me know if i made a mistake somewhere.

Best,

Stéphane

can you elaborate on 28M+ record. What is a records (article, authors)? How many triple is that?

Each record is 1 article including the authors, formatted exactly as the sample I gave in my first message in this thread.

So far after 13h I have 1081 of 6003 files processed, and passed the mark of 1B triples as I look through the studio interface.

Each file is about 31900 records, one per line, so for now I would have about 34,483,900 of 112M+ JSON records uploaded.

Again error messages continue, shutdown-restart every 10 seconds, and all 24 CPU threads are maxed out for several minutes, then lots of fluctuations for another few minutes, then maxed out again, and so on.

can you share an excerpt of the actual log.

Can you please share which version of Stardog you're running?

Here's stardog.log messages of several shutdown-restarts.

Address already in use
Waiting for running tasks to complete....done. Executor service has been shut down.
Stardog server 7.6.2 shutdown on Fri May 07 11:40:53 EDT 2021.

OpenJDK 64-Bit Server VM warning: Max heap size too large for Compressed Oops
WARN 2021-05-07 11:41:07,609 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
WARN 2021-05-07 11:41:08,456 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
INFO 2021-05-07 11:41:09,072 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(494): Initializing virtual graph registry
INFO 2021-05-07 11:41:09,107 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(518): Loaded virtual graph registry with 0 entries
INFO 2021-05-07 11:41:16,786 [main] com.complexible.stardog.StardogKernel:start(2404): Initializing Stardog


license message


Address already in use
Waiting for running tasks to complete....done. Executor service has been shut down.
Stardog server 7.6.2 shutdown on Fri May 07 11:41:18 EDT 2021.

OpenJDK 64-Bit Server VM warning: Max heap size too large for Compressed Oops
WARN 2021-05-07 11:41:32,516 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
WARN 2021-05-07 11:41:33,345 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
INFO 2021-05-07 11:41:34,624 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(494): Initializing virtual graph registry
INFO 2021-05-07 11:41:34,659 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(518): Loaded virtual graph registry with 0 entries
INFO 2021-05-07 11:41:43,059 [main] com.complexible.stardog.StardogKernel:start(2404): Initializing Stardog


license message


Address already in use
Waiting for running tasks to complete....done. Executor service has been shut down.
Stardog server 7.6.2 shutdown on Fri May 07 11:41:45 EDT 2021.

OpenJDK 64-Bit Server VM warning: Max heap size too large for Compressed Oops
WARN 2021-05-07 11:41:59,012 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
WARN 2021-05-07 11:41:59,749 [main] com.stardog.starrocks.StarrocksUtils:getLibraryName(83): No explicitly supported Linux distribution was found. Proceeding with the default native library.
INFO 2021-05-07 11:42:00,293 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(494): Initializing virtual graph registry
INFO 2021-05-07 11:42:00,335 [main] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:syncCache(518): Loaded virtual graph registry with 0 entries
INFO 2021-05-07 11:42:07,975 [main] com.complexible.stardog.StardogKernel:start(2404): Initializing Stardog


license message


Address already in use
Waiting for running tasks to complete....done. Executor service has been shut down.
Stardog server 7.6.2 shutdown on Fri May 07 11:42:10 EDT 2021.