Convert String Variable to URI using SPARQL

Using a string as an URI works using:

SELECT DISTINCT ?db_uri ?uri ?db_uri_var ?prop ?val

{
BIND(URI("http://dbpedia.org/resource/Delta_Air_Lines") as ?db_uri)
      
      SERVICE <http://DBpedia.org/sparql>
{
?db_uri ?prop ?val .
}
}

However, as the value for the string comes "from somewhere else", I would need to use a variable inside the URI-call. This seems not to work.

SELECT DISTINCT ?db_uri ?uri ?db_uri_var ?prop ?val

{
BIND("http://dbpedia.org/resource/Delta_Air_Lines" as ?uri)
BIND(URI(?uri) as ?db_uri)
      
      SERVICE <http://DBpedia.org/sparql>
{
?db_uri ?prop ?val .
}
}

Can you share the query plans for these two? I suspect it has nothing to do with the IRI/URI label coming from a variable, but instead the fact that you are going to need to ask for the entire contents of the DBpedia database to execute this query.

Additionally, if you can share something closer to the actual query you need, we can provide some guidance on getting that to run.

First query:

Slice(offset=0, limit=100) [#100]
`─ Distinct [#500]
   `─ Projection(?db_uri, ?prop, ?val) [#500]
      `─ ServiceJoin [#500]
         +─ Service <http://DBpedia.org/sparql>  {
         │  +─ Scan[SPO](?db_uri, ?prop, ?val)
         │  }
         `─ Bind(IRI("http://dbpedia.org/resource/Delta_Air_Lines") AS ?db_uri) [#1]

Second one:

Slice(offset=0, limit=100) [#100]
`─ Distinct [#500]
   `─ Projection(?uri, ?prop, ?val) [#500]
      `─ NestedLoopJoin(_) [#500]
         +─ Bind("http://dbpedia.org/resource/Delta_Air_Lines" AS ?uri) (IRI(?uri) AS ?db_uri) [#1]
         `─ Service <http://DBpedia.org/sparql>  {
            +─ Scan[SPO](?db_uri, ?prop, ?val)
            }

The simple version of the actual query will be along the following lines:

PREFIX : <http://dummy.org/dummy/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>

SELECT *
WHERE
{
     ?ent :category ?cat ;
         nee:detectedAs ?literal ;
         nee:hasMatchedURI ?uri_str .
     BIND(URI(?uri_str) as ?db_uri)
  SERVICE <http://DBpedia.org/sparql>
  {
    ?db_uri ?dbpProperty ?dbpValue .
  }
}

The service join is implemented by constraining the query sent to the SERVICE. As far as I know, DBpedia implicitly adds a LIMIT to the query and Stardog doesn't account for this. Without the ServiceJoin, DBpedia would only return the first page of results which aren't guaranteed to join with the rest of the query. Let me look further into the issue you've observed with given example.

Are you able to share the query plan for your actual query?

Jess

Nothing super fancy so far ... getting strings as nee:hasMatchedURI (coming from external NER) and trying to convert those into valid URIs as the subject for DBpedia enrichment:

Slice(offset=0, limit=100) [#100]
`─ Projection(?t, ?ent, ?cat, ?literal, ?uri, ?db_uri, ?dbpProperty, ?dbpValue) [#1.0K]
   `─ NestedLoopJoin(_) [#1.0K]
      +─ Service <http://DBpedia.org/sparql>  {
      │  +─ Scan[SPO](?db_uri, ?dbpProperty, ?dbpValue)
      │  }
      `─ Bind(IRI(?uri) AS ?db_uri) [#131538.1M]
         `─ NestedLoopJoin(_) [#131538.1M]
            +─ NaryJoin(?ent) [#278K]
            │  +─ Scan[PSOC](?ent, nee:detectedAs, ?literal) [#711K]
            │  +─ Scan[PSOC](?ent, nee:hasMatchedURI, ?uri) [#278K]
            │  `─ Scan[PSOC](?ent, twitter:category, ?cat) [#711K]
            `─ Scan[POSC](?t, rdf:type, twitter:Tweet) [#473K]

Can you share the plan for this query:

select * {
# We hint to the optimizer that the pattern is selective relative to the DBpedia SERVICE which
# we otherwise only have heuristics to decide on join algorithms
{ #pragma cardinality 10
?ent nee:detectedAs ?literal .
?ent nee:hasMatchedURI ?uri .
?ent twitter:category ?cat .
BIND(IRI(?uri) as ?db_uri)
}

  SERVICE <http://DBpedia.org/sparql>
  {
    ?db_uri ?dbpProperty ?dbpValue .
  }
}

Plan as below. The query itself is timing out for me, unfortunately ...

Slice(offset=0, limit=100) [#100]
`─ Projection(?ent, ?literal, ?uri, ?cat, ?db_uri, ?dbpProperty, ?dbpValue) [#1.0K]
   `─ NestedLoopJoin(_) [#1.0K]
      +─ Bind(IRI(?uri) AS ?db_uri) [#278K]
      │  `─ NaryJoin(?ent) [#278K]
      │     +─ Scan[PSOC](?ent, twitter:category, ?cat) [#711K]
      │     +─ Scan[PSOC](?ent, nee:detectedAs, ?literal) [#711K]
      │     `─ Scan[PSOC](?ent, nee:hasMatchedURI, ?uri) [#278K]
      `─ Service <http://DBpedia.org/sparql>  {
         +─ Scan[SPO](?db_uri, ?dbpProperty, ?dbpValue)
         }

that can't be the issue for the two dummy example queries. For the first query it works as Stardog is capable of putting the BIND inside the SERVICE clause whereas for the second query it isn't for some reason although there are also only two consecutive BIND clauses which could be simplified/evaluated first and then again could be put into the SERVICE

So the question is why is

BIND("foo" AS ?y)
SERVICE {
?y ?p ?o .
}

differently evaluated from

BIND("foo" AS ?x)
BIND(?x AS ?y)
SERVICE {
?y ?p ?o .
}

I know that evaluation of SERVICE with external inline data via VALUES or BIND is unfortunately only informative in the specs: SPARQL 1.1 Federated Query

but it's clearly more efficient to put as much data as possible inside the SERVICE query if possible.
Maybe there are some query evaluation hints for SERVICE clauses?

Side note: Other triple stores also have "trouble" with federated queries.

Hi Lorenz!

We don't always evaluate static expressions before runtime. For better or worse, it is what it is and should not lead to incorrect results. And in this case, it's not the root cause of incorrect results. The problem is that the DBpedia service limits the size of a result set. If you send a query of ?s ?p ?o, you will receive a limited set of otherwise arbitrary results. Stardog doesn't account for the fact that DBpedia returns incomplete results and therefore makes query optimization decisions that may manifest in incomplete results from Stardog given this limitation of DBpedia.

The assertion that it's clearly more efficient to put as much data as possible inside the SERVICE query if possible is not universally true. Stardog doesn't have any knowledge of the size of the upstream data source. While a ?s ?p ?o query is a possible heuristic for the likelihood that we should push constraints to the service, we don't always do it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.