Is Hive 1.1 supported for Virtual Graphs?

Would like to make a virtual Graph to hive 1.1 using the stardog server accesses hive through a secure ha_proxy to a hive jdbc port on the remote Hadoop cluster

stardog studio -> stardog server -> ha_proxy -> hive Hadoop cluster

  1. To get the stardog server to access hive, putting the recommend hive-jdbc-2.3.2.jar into an external directory, did not work. Instead I copied all the hive files from the Hadoop server to the Stardog external directory

    cd /opt/stardog/ext
    scp :/a/hive/lib/.jar' .
    scp :/a/hadoop/share/hadoop/common/
    .jar' .
    scp :/a/third-party/hadoop/share/hadoop/common/lib/slf4j*.jar' .

The hive jdbc jar might be:
hive-jdbc-1.1.0-cdh5.9.0.jar

  1. The virtual graph mapping definition Loaded
    PREFIX : <http://test.akamai.com/location/entities#>
    PREFIX i: <http://test.akamai.com/location/entities#>
    PREFIX loc: <http://test.akamai.com/location#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX stardog: <tag:stardog:api:>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
     
    # Simple mapping to make Metro instances from a hive table with only 725 rows
    MAPPING :TestMetro
    FROM SQL {
        SELECT * FROM metros
    }
    TO {
        ?metro a loc:Metro .
        ?country a loc:Country .
        ?continent a loc:Continent .
        ?metro loc:LocatedIn ?country .
        ?country loc:LocatedIn ?continent .
    }
    WHERE {
        BIND(template("http://test.akamai.com/location/entities#Metro_{metro}") as ?metro)
        BIND(template("http://test.akamai.com/location/entities#Country_{country}") as ?country)
        BIND(template("http://test.akamai.com/location/entities#Continent_{continent}") as ?continent)
    }
  1. A simple Query worked:
    SELECT ?metro  ?country {
    GRAPH <virtual://gsp3> {
        ?metro a loc:Metro .
    }
  1. But a more complicated Query does NOT work , because the Stardog generated SQL query is not valid:

Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 5:11 Table not found 'pp'

Here is the /opt/startdog/data/stardog.log output
Stardog_hive.log (9.8 KB)

Thanks for the query plan and the log. I'm assuming you've made locatedIn a transitive property and executed this query with reasoning? The reason for the error is that Stardog uses recursive CTE in SQL to answer queries for transitive properties. Hive doesn't support recursive CTE and thus fails on this query.

Is there a work around? like use generic SQL instead of Hive?
Is there any documentation on the restrictions of virtual graphs?
For example, how do you support transitive property for Non-SQL db, or limited/read only SQL-like, such as Clickhouse, Presto

I would like to show the power of has_a relationship, which uses the transitive property.

Also, is the Hive jar setup scheme ok?

How about a work around where the transitive inference is done inside stardog and not in Hive?
Is there a way to define a "dumb" binding with indirection?
For example, the hive query would just return the "rawLocatedIn" relations without transitive property?
Then something transforms the returned raw_relationship to the real LocatedIn relationships, which have the transitive property.

Is there a pattern for moving inference into stardog?

We are planning to allow transparently querying transitive predicates over databases that don't support recursive SQL, but don't have a concrete timeline for implementing this feature. In the meantime, we recommend materializing the subset of the virtual graph into Stardog which supports this type of query.

So StarDog planner depends on SQL features that might not be supported by the SQL Server. I am surprised because working around constraints, seems to be a generic problem for a planner. The plans I looked at so far have some really clever parts.

Is there a way to mark SQL servers with constraint_properties like "supports_recursive_CTE", and then have StarDog fail with a warning when the constraints is not met?

That's certainly something we could (and should) be doing. We handle things similarly in a lot cases for upstream platform differences include SQL syntax deviations (LIMIT/FETCH is a big one), ability to join which is not supported in the general case in MongoDB or Cassandra, etc. When we implement this for recursive SQL / transitive properties, it will extend to evaluating the transitive property inside Stardog which requires additional round trips to the upstream data source.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.