IllegalArgumentException in entityExtractor SPARQL Service

Hi,

I'm trying to use the entityExtractor SPARQL service, but running into an out of bounds error. Here is the query I am running:

select * {
  ?iri dct:description ?text
  service docs:entityExtractor {
    []  docs:text ?text ;
        docs:mention ?mention .
  }
} 

Here is part of the error from stardog.log:

Caused by: java.lang.IllegalArgumentException: The span [227..243) is outside the given text which has length 197!
	at opennlp.tools.util.Span.getCoveredText(Span.java:231) ~[opennlp-tools-1.9.0.jar:1.9.0]
	at opennlp.tools.util.Span.spansToStrings(Span.java:351) ~[opennlp-tools-1.9.0.jar:1.9.0]
	at opennlp.tools.tokenize.AbstractTokenizer.tokenize(AbstractTokenizer.java:25) ~[opennlp-tools-1.9.0.jar:1.9.0]
	at opennlp.tools.tokenize.TokenizerME.tokenize(TokenizerME.java:76) ~[opennlp-tools-1.9.0.jar:1.9.0]
	at com.complexible.stardog.docs.nlp.impl.OpenNLPDocumentParser.apply(OpenNLPDocumentParser.java:120) ~[stardog-bites-core-6.0.0.jar:?]

If I limit this to just one result, it will occasionally return a result. Any help understanding what's going on is much appreciated.

Hey Nolan,

Are you able to share one of the ?text values that causes this exception? If necessary you can email it to me at jess@stardog.com. Thanks.

Jess

Thanks for following up with me on this via email. As we discussed, I was able to submit each of the individual ?text values via a script without error.

The issue only seems to occur when using ?text as a bound variable.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.