Stardog's NLP extractors

Hi, I try to use one of Stardog features BITES for NLP.

I downloaded the latest jar ((Releases · stardog-union/bites-corenlp · GitHub) and put in the STARDOG_EXT.

I got follow error while restarting the stardog server:

An unexpected error occurred.
java.lang.NoClassDefFoundError: org/openrdf/model/Value

any special setting needed?

Stardog has transitioned from using openrdf to their own stark api. The extractors have been updated to use the new api but you'll have to checkout the repo and build the latest code from master.

I'm building it now. What version of Stardog are you running? It looks like it's building against 6.2.0 so if you're running 7.x it may need some minor updates to run against 7

The only changes required for 7.0.2 were setting java.library.path so that the tests would run. Other than that it looks good.

I've included a jar from that build. You'll just need to change the extension from .zip back to .jar bites-corenlp-1.1.zip (8.6 KB)

hi, we are running 7.0.1. thanks for details.

however is it possible to attach a jar file instead. the downloaded file always show as a zip file even i try to change to jar. thanks

I can only upload a file type .zip

You can download the file as is and then just change the name from .zip to .jar after it has downloaded.

thanks a lot!. it works in Linux. I was downloading in window and change extension not working.

i am able to restart Stardog server after that.

Do you have doc on how to use stardog for nlp? I created a document with following sentence: "The Orioles are a professional baseball team based in Baltimore."

run the extractor as
stardog doc put --rdf-extractors CoreNLPMentionRDFExtractor documents test.doc

return edu/stanford/nlp/pipeline/CoreDocument

I try to query any triple in database, but there is nothing returned.

@zachary.whitley you just built the "normal" Jar file, but I'm pretty sure you'll need the fat Jar. BITES uses NLP based on Stanford CoreNLP which uses very huge pre-trained models. Given that BITES is an extension, those models are not shipped with the standard Stardog direstribution.

I built it now for Stardog 7.0.2 , the whole Jar file is ~1.8GB on my machine

./gradleW fatJar

@lorenz_b thanks. My mistake. I built ./gradlew jar not ./gradlew fatJar

1 Like

hi, is it possible we can get fatJar? Can you put the fatjar in git so we can get? thanks!

Here you go https://drive.google.com/file/d/1b7pm_KW169UUJAVQkOnv8xWuw_H0ecdi/view?usp=sharing

EDIT: I have removed this link please see following post for official build.

By the way, works as expected for me with Stardog 7.0.2

And don't forget, the extracted triples are contained in a separate graph for each document.

@s5uybw I have compiled the newest fatJar against 7.0.2 and created a release on GitHub: Release v1.2 · stardog-union/bites-corenlp · GitHub. Please let me know if you have problems with it!

Thanks!
it works for CoreNLPMentionRDFExtractor . The extracted triples (Entities) are loaded in the named graph.

However, got java heap space error if I run
stardog doc put --rdf-extractors CoreNLPRelationRDFExtractor documents test.doc

Stardog.log:
ERROR 2019-10-18 19:57:27,165 [stardog-user-2] com.stardog.http.server.undertow.ErrorHandling:writeError(138): Unexpected error on the server
java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:661) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:643) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.initialize(DependencyParser.java:1186) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:630) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.loadFromModelFile(DependencyParser.java:499) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.DependencyParseAnnotator.(DependencyParseAnnotator.java:57) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.AnnotatorImplementations.dependencies(AnnotatorImplementations.java:240) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$57(StanfordCoreNLP.java:559) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$1305/836250417.apply(Unknown Source) ~[?:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$69(StanfordCoreNLP.java:625) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$1314/285508813.get(Unknown Source) ~[?:?]
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.corenlp.CoreNLPRelationRDFExtractor.getPipeline(CoreNLPRelationRDFExtractor.java:77) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.corenlp.CoreNLPRelationRDFExtractor.extractFromText(CoreNLPRelationRDFExtractor.java:87) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.extraction.tika.TextProvidingRDFExtractor.extract(TextProvidingRDFExtractor.java:46) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.extraction.tika.TextProvidingRDFExtractor.extract(TextProvidingRDFExtractor.java:28) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.db.ConnectableBitesConnectionImpl.lambda$extract$2(ConnectableBitesConnectionImpl.java:162) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.db.ConnectableBitesConnectionImpl$$Lambda$1260/369201145.apply(Unknown Source) ~[?:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_222]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_222]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_222]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_222]
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:541) ~[?:1.8.0_222]

The text in document is from sample: "The Orioles are a professional baseball team based in Baltimore."

Are you able to run CoreNLPRelationRDFExtractor for entities and links?
Anything i need to configure to avoid java heap memory issue for this simple text?

thanks alot

I'm not quite sure what you're asking here.

The NLP models would require additional heap space so you'll need to allocate more heap space by adjusting your STARDOG_SERVER_JAVA_ARGS environment variable.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.