Is Hive 1.1 supported for Virtual Graphs?

johnz · July 8, 2020, 2:11pm

Would like to make a virtual Graph to hive 1.1 using the stardog server accesses hive through a secure ha_proxy to a hive jdbc port on the remote Hadoop cluster

stardog studio -> stardog server -> ha_proxy -> hive Hadoop cluster

To get the stardog server to access hive, putting the recommend hive-jdbc-2.3.2.jar into an external directory, did not work. Instead I copied all the hive files from the Hadoop server to the Stardog external directory

cd /opt/stardog/ext
scp :/a/hive/lib/.jar' .
scp :/a/hadoop/share/hadoop/common/.jar' .
scp :/a/third-party/hadoop/share/hadoop/common/lib/slf4j*.jar' .

The hive jdbc jar might be:
hive-jdbc-1.1.0-cdh5.9.0.jar

The virtual graph mapping definition Loaded

    PREFIX : <http://test.akamai.com/location/entities#>
    PREFIX i: <http://test.akamai.com/location/entities#>
    PREFIX loc: <http://test.akamai.com/location#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX stardog: <tag:stardog:api:>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
     
    # Simple mapping to make Metro instances from a hive table with only 725 rows
    MAPPING :TestMetro
    FROM SQL {
        SELECT * FROM metros
    }
    TO {
        ?metro a loc:Metro .
        ?country a loc:Country .
        ?continent a loc:Continent .
        ?metro loc:LocatedIn ?country .
        ?country loc:LocatedIn ?continent .
    }
    WHERE {
        BIND(template("http://test.akamai.com/location/entities#Metro_{metro}") as ?metro)
        BIND(template("http://test.akamai.com/location/entities#Country_{country}") as ?country)
        BIND(template("http://test.akamai.com/location/entities#Continent_{continent}") as ?continent)
    }

A simple Query worked:

    SELECT ?metro  ?country {
    GRAPH <virtual://gsp3> {
        ?metro a loc:Metro .
    }

But a more complicated Query does NOT work , because the Stardog generated SQL query is not valid:

Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 5:11 Table not found 'pp'

Here is the /opt/startdog/data/stardog.log output
Stardog_hive.log (9.8 KB)

jess · July 8, 2020, 2:17pm

Thanks for the query plan and the log. I'm assuming you've made locatedIn a transitive property and executed this query with reasoning? The reason for the error is that Stardog uses recursive CTE in SQL to answer queries for transitive properties. Hive doesn't support recursive CTE and thus fails on this query.

johnz · July 8, 2020, 2:33pm

Is there a work around? like use generic SQL instead of Hive?
Is there any documentation on the restrictions of virtual graphs?
For example, how do you support transitive property for Non-SQL db, or limited/read only SQL-like, such as Clickhouse, Presto

I would like to show the power of has_a relationship, which uses the transitive property.

Also, is the Hive jar setup scheme ok?

johnz · July 8, 2020, 3:43pm

How about a work around where the transitive inference is done inside stardog and not in Hive?
Is there a way to define a "dumb" binding with indirection?
For example, the hive query would just return the "rawLocatedIn" relations without transitive property?
Then something transforms the returned raw_relationship to the real LocatedIn relationships, which have the transitive property.

Is there a pattern for moving inference into stardog?

jess · July 13, 2020, 10:58pm

We are planning to allow transparently querying transitive predicates over databases that don't support recursive SQL, but don't have a concrete timeline for implementing this feature. In the meantime, we recommend materializing the subset of the virtual graph into Stardog which supports this type of query.

johnz · July 14, 2020, 7:14pm

So StarDog planner depends on SQL features that might not be supported by the SQL Server. I am surprised because working around constraints, seems to be a generic problem for a planner. The plans I looked at so far have some really clever parts.

Is there a way to mark SQL servers with constraint_properties like "supports_recursive_CTE", and then have StarDog fail with a warning when the constraints is not met?

jess · July 14, 2020, 9:05pm

That's certainly something we could (and should) be doing. We handle things similarly in a lot cases for upstream platform differences include SQL syntax deviations (LIMIT/FETCH is a big one), ability to join which is not supported in the general case in MongoDB or Cassandra, etc. When we implement this for recursive SQL / transitive properties, it will extend to evaluating the transitive property inside Stardog which requires additional round trips to the upstream data source.

system · July 28, 2020, 9:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does Stardog support Multi DB? Support	2	427	March 8, 2018
Reconnection does not retry to remote Virtual Graph Support	4	299	February 15, 2022
Stardog/SPARQL Virtual Graph Bug	5	323	February 22, 2023
Stardog Studio - SparQL Query issue against Databricks Bug	5	413	February 15, 2022
Using Virtual Graph with BigQuery Support	1	369	February 6, 2020

Is Hive 1.1 supported for Virtual Graphs?

Related topics