How to handle nulls in SNARL

Hi,

I am pretty new to Stardog. I am using SNARL to create and load triples into Stardog. The triples are generated from a Java data structure which is the end result of a UIMA pipeline. I am having two specific issues: handling nulls that crop up in the data, and logging.

First of all, how do you enable logging when inserting data from SNARL? My stardog.log file only has records of inserts through the admin interface, but has nothing for inserts that happen from SNARL. That has meant that it took much longer to find the source of the following error.

What was happening was that Stardog was throwing a null pointer exception intermittently. I traced it down to a place in the underlying data that contained a null value. It is hard to do anything about that because there is a lot of data. So I am wondering if there is a standard SNARL way of handling null values before they get into the Statement structure. Here is the code - the null pointer exception is thrown at the line “Model ctgraph = Models2.newModel(stmts);”

private void writeToStardog(ClinicalTrialInfo trial)
{
	System.err.println("about to write to stardog");
	System.err.println("Trial is " + trial.getNctId());
	// establish a connection to the ctkr database		
	Connection aConn = ConnectionConfiguration
			.to("ctkr")
			.server("http://localhost:5820")
			.credentials("admin", "admin")             
			.connect();
	
	//put in data
	aConn.begin();
    

    ArrayList<Statement> stmts = new ArrayList<Statement>();
    stmts.add(Values.statement(Values.iri(CT, trial.getNctId()), 
			 Values.iri(RDF, "type"), 
			 Values.iri(CT, "Trial")));
    stmts.add(Values.statement(Values.iri(CT, trial.getNctId()), 
    		Values.iri(CT, "hasNCT"),
    		 literal(trial.getNctId())));
    ArrayList<Statement> interventionStmts = makeInterventionRep(trial);
    ArrayList<Statement> criteriaStmts = makeCriteriaRep(trial);
   stmts.addAll(interventionStmts);
    stmts.addAll(criteriaStmts);
    
    System.err.println("dumping stmts");
    for (Statement curr : stmts)
    	System.err.println(curr);
    
    Model ctgraph = Models2.newModel(stmts);
	aConn.add().graph(ctgraph);
	aConn.commit();
	System.err.println("wrote triples for trial " + trial.getNctId() + "to Stardog");
		
	
		aConn.close();
	
}

But the root cause is elsewhere - in this code snippet

Literal val = literal(criterion.getCriterionType());
			System.err.println("literal for criterion type is " + val);
			if (val != null)
			{
				System.err.println("criterion type is valid it is " + criterion.getCriterionType());
			    cStmts.add(Values.statement(criteriaIRI, 
					Values.iri(CT,"criteriaType"), 
					literal(criterion.getCriterionType())));
			}
			else
				System.err.println("criteria type is not valid");

You can see that I have wrapped this with a test to make sure the criterion type is not null, since that happens every so often in the data. But this is ugly, so I am wondering if there is a better way to handle this. And why does Stardog fail so abjectly when this happens? This is the actual exception

org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
	at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
	at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150)
	at ecproj.krgen.PipelineSystem.<init>(PipelineSystem.java:84)
	at ecproj.krgen.PipelineSystem.main(PipelineSystem.java:99)
Caused by: java.lang.NullPointerException
	at com.complexible.common.rdf.model.AbstractStardogLiteral.hashCode(AbstractStardogLiteral.java:53)
	at com.complexible.common.rdf.model.StardogStringLiteral.hashCode(StardogStringLiteral.java:14)
	at java.util.HashMap.hash(HashMap.java:338)
	at java.util.HashMap.get(HashMap.java:556)
	at org.openrdf.model.impl.LinkedHashModel.asNode(LinkedHashModel.java:543)
	at org.openrdf.model.impl.LinkedHashModel.add(LinkedHashModel.java:171)
	at org.openrdf.model.impl.AbstractModel.add(AbstractModel.java:49)
	at org.openrdf.model.impl.AbstractModel.addAll(AbstractModel.java:139)
	at com.google.common.collect.Iterables.addAll(Iterables.java:352)
	at com.complexible.common.openrdf.model.Models2.newModel(Models2.java:90)
	at ecproj.krgen.KRWriter.writeToStardog(KRWriter.java:328)
	at ecproj.krgen.KRWriter.process(KRWriter.java:184)
	at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
	... 9 more

It took a couple of hours to find the culprit - with logging, it might have been much quicker, so I would really like to know if it is possible to log inserts that come from SNARL. And again, what is best practice for handling nulls?

Hi and welcome!

You can implement a transaction listener to perform a task (such as logging) on transactions from the SNARL protocol. We have an example of this in our examples repo.

null doesn't make sense as the object of a triple (or any element of a triple for that matter). We are actively re-working some of our RDF parsing such that this kind of error should be caught at compile time, however until then what you're doing is my recommendation.

If it's too "ugly," perhaps consider tweaking the literal() method to return an empty String when passed a null value? This is assuming you would want that data loaded at all.

Hi,

Thanks. Your final question “would you want that data loaded at all?” is a good one. Right now, I simply do not create a triple if a null is found, and that may well make the most sense. But if a null ended up in that particular part of the data, the semantics may actually be “unknown” rather than “not there at all”. It is the good old ambiguous null situation. We are discussing what we want to do with that.

But the more serious issue is the logging. I took a look at the link you sent. I am trying to understand where the logging is. Do you mean this snippet in ExampleConnectableConnection.processIndexChange ?

try {
StatementSources.write(aStatements, RDFFormat.NQUADS, System.out);
}
catch (IOException e) {
LOGGER.warn(“Error while writing transaction data”, e);
}

If so, I don’t see how it helps - I still wouldn’t be able to identify the specific triple statement that is causing the problem. Am I missing something?

Yes, processIndexChange is where you have access to each Statement being added. In the case of what you pasted, it should output the RDF directly to your stardog.log. You can also do something like aStatements.statements().forEachRemaining(System.out::println); to accomplish a similar result.

I guess I am still not understanding how this would help because the listener just appears to be constructing a bunch of statements and then logging them. What I want is a log message for each individual statement, if an exception is thrown, Maybe I can only get this if I add each statement individually? But would that even work?

for (Statement curr : stmts)
{
Model ctgraph = Models2.newModel(curr)
aConn.add().graph(ctgraph);
}
catch (Exception e)
{
// do some logging
}

The issue is that the exception is coming out of Models2.newModel and I want to know which particular statement is causing it. Right now, as you can see, I dump all the statements before this point, but wading through hundreds and hundreds of statements, trying to guess the one that offended, is not fun.

I think you can safely add each statement one at a time. You won’t trigger a network round trip until the commit and you can avoid building the list of statements. I’m not sure but I’d guess that if you had a lot of data that it will buffer some set amount and trigger a network round trip sometime before the commit but I’m not positive about that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.