New language detection function

zachary.whitley · February 23, 2021, 9:22pm

I just posted a new release of the Stardog-extension functions that includes a new language detection function. RDF has nice support for language tags but most data doesn't include this information it can be impossible to add for anything other than a trivial data set.

The functions come in two types the first detects the most likely language of a string and returns the string with the appropriate language tag.


select ?result where { bind(lang:detect("Stardog graph database") AS ?result) }

should return

"Stardog graph database"@en"

The second version returns a score for each language

 prefix array: <http://semantalytics.com/2017/09/ns/stardog/kibble/array/>

select ?result where { bind(array:toString(lang:score("Stardog graph database")) AS ?result) }

should return

[ [ "en"^^<http://www.w3.org/2001/XMLSchema#string> "1.0"^^<http://www.w3.org/2001/XMLSchema#double> ] [ "xh"^^<http://www.w3.org/2001/XMLSchema#string> "0.9931761961991691"^^<http://www.w3.org/2001/XMLSchema#double> ] [ "zu"^^<http://www.w3.org/2001/XMLSchema#string> 
.....

The library used detects 74 languages and you can get a list of the languages with the function lang:detectbleLanguages. A word of caution. The complete model is approximately 3.4Gb and takes about 10sec to load. I've done what I can to cache the model so given sufficient memory it should only take the 10 seconds the first time you run it. There are also separate functions like lang:detectFrom that take a list of languages to detect that would load much faster.

There are also new functions for computing ngrams in both the array and strings package that return arrays of tokens and strings respectively.

system · March 9, 2021, 9:23pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fallback language? Support	3	690	September 20, 2021
Release of String library Support	2	571	March 20, 2018
Web annotations and NLP Feature Request	0	549	June 11, 2018
Stardog 7.1.0 release Announcements	0	669	December 12, 2019
String library Stardog 7 support Support	1	314	October 2, 2019

New language detection function

Related topics