I currently use hash encoding for date and character source data to create IRI-safe values. This seems like overkill. Is there a percent encoding in SMS like there is for R2RML, to convert values like " 12 Dec 2016" to :
12%20Dec%202016
Similar for IRIs that come from Character fields that have spaces, like: “Hispanic or Latino”
to:
You can make it a part of your SQL query by calling the URLENCODE() function. (or something similar depending on your database). I think there is a property setting to specify non-standard functions that you’d like to call. I can’t remember it off the top of my head but give me a second and I’ll find it.
The parameter is sql.functions
“A comma-separated list of SQL function names to register with the parser. If an R2RML view (using rr:sqlQuery) fails to parse, this option can be set to allow use of non-standard functions.”
I’m a little confused by the ‘#’ in there. Was that a typo? Can you elaborate a little bit more on what you mean by hashing? I believe that if the value is included as part of a URL template it will automatically be percent encoded. Is that not happening?
This is something that we currently don't support, but R2RML defines IRI-safe versions of templates, and we have a ticket in place to implement that on the SMS side.
“I am disappoint” but glad it is on the radar. Do you have an ETA for implementation? We are defining data upload/conversion scripts and documentation for an industry project and:
It’s slightly outside of your current workflow but you might want to try importing your csv files into an RDBMS and try using the URLENCODE function that I suggested earlier or you could get all fancy with a bash script and awk and do the url encoding there.
No ETA at the moment. However I would make the point that something like rdfs:label is more appropriate for a “human-readable” name of something. So if you had, elsewhere in your mapping…
code:Ethnicity_gnja37oohiiipittns2ro9rma4k5q8i5 rdfs:label "Hispanic or Latino" .
You could extend existing queries to include it without much work:
I’m already processing the data using R to convert the source SAS Transport Format (XPT) to CSV for the SMS process, so I could URL encode at the same time. I had hoped to leave the source data as pristine as possible, but this likely the best kludge for me until SMS gets URL encoding in place.
R code mock up:
library(utils)
ethnicity <-"Hispanic or Latino"
en_ethnicity <- URLencode(ethnicity)
then in the SMS:
study:ethnicity code:Enthnicity_{en_ethnicity} ;
Thanks for time walking me through this. Case closed and have a great day!
Stephen, you make a good point regarding rdfs:label for human readability. The hashed IRI is fine from the machine’s perspective. The “decode” of our terms is in another graph. We’re trying to strike a compromise between machine processing and human readability, mainly because we need to sell this approach to entry-level RDF folks. “See - that value here in the instance graph is linked to the terminology graph over here using this IRI…”
Thanks, but I have to agree with Stephen that something that would be queryable would be good. I’m not sure what your exact use case is so I’m just throwin’ options out there.