Stardog's NLP extractors

s5uybw · October 9, 2019, 4:39pm

Hi, I try to use one of Stardog features BITES for NLP.

I downloaded the latest jar ((Releases · stardog-union/bites-corenlp · GitHub) and put in the STARDOG_EXT.

I got follow error while restarting the stardog server:

An unexpected error occurred.
java.lang.NoClassDefFoundError: org/openrdf/model/Value

any special setting needed?

zachary.whitley · October 9, 2019, 4:46pm

Stardog has transitioned from using openrdf to their own stark api. The extractors have been updated to use the new api but you'll have to checkout the repo and build the latest code from master.

I'm building it now. What version of Stardog are you running? It looks like it's building against 6.2.0 so if you're running 7.x it may need some minor updates to run against 7

zachary.whitley · October 9, 2019, 5:15pm

The only changes required for 7.0.2 were setting java.library.path so that the tests would run. Other than that it looks good.

I've included a jar from that build. You'll just need to change the extension from .zip back to .jar bites-corenlp-1.1.zip (8.6 KB)

s5uybw · October 9, 2019, 8:37pm

hi, we are running 7.0.1. thanks for details.

however is it possible to attach a jar file instead. the downloaded file always show as a zip file even i try to change to jar. thanks

zachary.whitley · October 9, 2019, 9:00pm

I can only upload a file type .zip

You can download the file as is and then just change the name from .zip to .jar after it has downloaded.

s5uybw · October 9, 2019, 9:16pm

thanks a lot!. it works in Linux. I was downloading in window and change extension not working.

i am able to restart Stardog server after that.

Do you have doc on how to use stardog for nlp? I created a document with following sentence: "The Orioles are a professional baseball team based in Baltimore."

run the extractor as
stardog doc put --rdf-extractors CoreNLPMentionRDFExtractor documents test.doc

return edu/stanford/nlp/pipeline/CoreDocument

I try to query any triple in database, but there is nothing returned.

lorenz_b · October 10, 2019, 7:46am

@zachary.whitley you just built the "normal" Jar file, but I'm pretty sure you'll need the fat Jar. BITES uses NLP based on Stanford CoreNLP which uses very huge pre-trained models. Given that BITES is an extension, those models are not shipped with the standard Stardog direstribution.

I built it now for Stardog 7.0.2 , the whole Jar file is ~1.8GB on my machine

./gradleW fatJar

zachary.whitley · October 10, 2019, 12:17pm

@lorenz_b thanks. My mistake. I built ./gradlew jar not ./gradlew fatJar

s5uybw · October 10, 2019, 3:49pm

hi, is it possible we can get fatJar? Can you put the fatjar in git so we can get? thanks!

zachary.whitley · October 10, 2019, 4:19pm

Here you go https://drive.google.com/file/d/1b7pm_KW169UUJAVQkOnv8xWuw_H0ecdi/view?usp=sharing

EDIT: I have removed this link please see following post for official build.

lorenz_b · October 11, 2019, 6:09am

By the way, works as expected for me with Stardog 7.0.2

And don't forget, the extracted triples are contained in a separate graph for each document.

stephen · October 16, 2019, 5:28pm

@s5uybw I have compiled the newest fatJar against 7.0.2 and created a release on GitHub: Release v1.2 · stardog-union/bites-corenlp · GitHub. Please let me know if you have problems with it!

s5uybw · October 18, 2019, 8:18pm

Thanks!
it works for CoreNLPMentionRDFExtractor . The extracted triples (Entities) are loaded in the named graph.

However, got java heap space error if I run
stardog doc put --rdf-extractors CoreNLPRelationRDFExtractor documents test.doc

Stardog.log:
ERROR 2019-10-18 19:57:27,165 [stardog-user-2] com.stardog.http.server.undertow.ErrorHandling:writeError(138): Unexpected error on the server
java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:661) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:643) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.initialize(DependencyParser.java:1186) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:630) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.parser.nndep.DependencyParser.loadFromModelFile(DependencyParser.java:499) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.DependencyParseAnnotator.(DependencyParseAnnotator.java:57) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.AnnotatorImplementations.dependencies(AnnotatorImplementations.java:240) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$57(StanfordCoreNLP.java:559) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$1305/836250417.apply(Unknown Source) ~[?:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$69(StanfordCoreNLP.java:625) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$1314/285508813.get(Unknown Source) ~[?:?]
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194) ~[bites-corenlp-all-1.2.jar:?]
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.corenlp.CoreNLPRelationRDFExtractor.getPipeline(CoreNLPRelationRDFExtractor.java:77) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.corenlp.CoreNLPRelationRDFExtractor.extractFromText(CoreNLPRelationRDFExtractor.java:87) ~[bites-corenlp-all-1.2.jar:?]
at com.complexible.stardog.docs.extraction.tika.TextProvidingRDFExtractor.extract(TextProvidingRDFExtractor.java:46) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.extraction.tika.TextProvidingRDFExtractor.extract(TextProvidingRDFExtractor.java:28) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.db.ConnectableBitesConnectionImpl.lambda$extract$2(ConnectableBitesConnectionImpl.java:162) ~[stardog-bites-core-7.0.1.jar:?]
at com.complexible.stardog.docs.db.ConnectableBitesConnectionImpl$$Lambda$1260/369201145.apply(Unknown Source) ~[?:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_222]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_222]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_222]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_222]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_222]
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:541) ~[?:1.8.0_222]

s5uybw · October 21, 2019, 7:22pm

The text in document is from sample: "The Orioles are a professional baseball team based in Baltimore."

Are you able to run CoreNLPRelationRDFExtractor for entities and links?
Anything i need to configure to avoid java heap memory issue for this simple text?

thanks alot

zachary.whitley · October 22, 2019, 12:07pm

I'm not quite sure what you're asking here.

The NLP models would require additional heap space so you'll need to allocate more heap space by adjusting your STARDOG_SERVER_JAVA_ARGS environment variable.

system · November 5, 2019, 12:08pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CoreNLP extractors only work for Stardog V7? Support	36	672	February 5, 2020
BITES CoreNLP Error Support	32	553	July 12, 2025
CoreNLP 1.2 jar error at startup with Stardog 7.3.0 Support	10	415	June 12, 2020
Invalid NLP models directory Support	4	350	January 21, 2021
RDFParseException (Unexpected end of file) after Stardog finishes parsing triples Support	8	943	September 12, 2018

Stardog's NLP extractors

Related topics