Error connecting to Apache Iceberg table via AWS Athena

AWS recently announced support for ACID transactions in Athena using Apache Iceberg.

I can connect to an Athena data source and virtualize standard parquet-based tables no problem, but when I try to connect to a table that is backed by Apache Iceberg I get the following error in the console:

And part of the error in the logs

ERROR 2022-05-10 23:40:40,277 [stardog-user-2] com.complexible.stardog.virtual.DefaultVirtualGraphRegistry:createVirtualGraph(678): Cannot initialize virtual graph gnomad_variant_consequence
com.complexible.stardog.StardogException: Exception reading column metadata for 'AwsDataCatalog.gnomad.gnomad_variant_consequence'
        at com.complexible.stardog.virtual.vega.rdbms.metadata.JdbcConnectionMetadataReader.getColumns(JdbcConnectionMetadataReader.java:114) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.rdbms.metadata.JdbcConnectionMetadataReader.getTableMetadata(JdbcConnectionMetadataReader.java:72) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.rdbms.metadata.JdbcSourceMetadataCache.getTableMetadata(JdbcSourceMetadataCache.java:88) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.rdbms.JdbcDataSource.getTableMetadata(JdbcDataSource.java:374) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.calcite.VegaJdbcTable.getTableMetadata(VegaJdbcTable.java:165) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.calcite.VegaJdbcTable.getStatistic(VegaJdbcTable.java:156) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.calcite.VegaJdbcTable.getStatistic(VegaJdbcTable.java:89) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.mapping.MappingGenerator.generateTriplesMap(MappingGenerator.java:213) ~[stardog-virtual-core-7.9.0.jar:?]
        at com.complexible.stardog.virtual.vega.mapping.MappingGenerator.lambda$generateTriplesMaps$7(MappingGenerator.java:160) ~[stardog-virtual-core-7.9.0.jar:?]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]

I've tried the Athena jdbc driver listed in the stardog docs and also the latest from AWS (SimbaAthenaJDBC-2.0.29.1000) to the same effect.

Is this a known issue or any ideas if it could be a Stardog or AWS driver bug?

Is there a CausedBy after that exception in the stardog.log file?

Also, I'm curious what transaction support should bring in the context of Stardog when the connections are read-only, or is there something about your larger ecosystem that calls for Iceberg?

Thanks,
-Paul

Hey @PaulJackson, indeed there is a CausedBy, here it is

Caused by: java.lang.NullPointerException
        at java.util.ArrayList.addAll(ArrayList.java:702) ~[?:?]
        at com.simba.athena.athena.api.AJClient.fetchColumnsMetadataWithGlue(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.api.AJClient.getColumnsMetadata(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.metadata.AJColumnsMetadataSource.initMetadata(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.metadata.AJColumnsMetadataSource.<init>(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.AJDataEngine.makeNewMetadataSource(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.dsi.dataengine.impl.DSIDataEngine.makeNewMetadataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.AJDataEngine.makeNewMetadataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.jdbc.jdbc42.S42DatabaseMetaData.createMetaDataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.jdbc.common.SDatabaseMetaData.getColumns(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.complexible.stardog.virtual.vega.rdbms.metadata.JdbcConnectionMetadataReader.getColumns(JdbcConnectionMetadataReader.java:99) ~[stardog-virtual-core-7.9.0.jar:?]

Regarding transaction support, I have data pipelines that are appending to iceberg tables rather than needing to write a new parquet dataset for each pipeline run - this is all upstream from Stardog and just need a read-only connection.

Hi Nolan,

I don't know for sure but I'm guessing Simba hasn't released support for this yet.

Looking at the stack trace, it seems to be failing when collecting metadata using a glue strategy. There should be a connection setting MetadataRetrievalMethod that you can include on the jdbc connection string. It has possible values ProxyAPI, Auto, Glue and Query. I'd suggest trying ProxyAPI or Query and see if that gets you around the problem.

-Paul

Thanks for the suggestion, but no such luck - ProxyAPI still makes a call to fetchColumnsMetadataWithGlue and Query uses fetchColumnsMetadataWithQuery with the same error.

Caused by: java.lang.NullPointerException
        at com.simba.athena.athena.api.AJClient.fetchColumnsMetadataWithQuery(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.api.AJClient.getColumnsMetadata(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.metadata.AJColumnsMetadataSource.initMetadata(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.metadata.AJColumnsMetadataSource.<init>(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.AJDataEngine.makeNewMetadataSource(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.dsi.dataengine.impl.DSIDataEngine.makeNewMetadataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.athena.dataengine.AJDataEngine.makeNewMetadataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.jdbc.jdbc42.S42DatabaseMetaData.createMetaDataResult(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.simba.athena.jdbc.common.SDatabaseMetaData.getColumns(Unknown Source) ~[AthenaJDBC42_2.0.7.jar:?]
        at com.complexible.stardog.virtual.vega.rdbms.metadata.JdbcConnectionMetadataReader.getColumns(JdbcConnectionMetadataReader.java:99) ~[stardog-virtual-core-7.9.0.jar:?]
        ... 49 more

It's strange because I can directly query the information_schema using the same driver in a SQL IDE w/o any issue and the IDE itself can pull the athena schema for browsing iceberg tables.

I'll try raising a support request with AWS to see if they have any ideas.

Any additional suggestions are welcome!

You're saying you can use the same driver and get the columns from the information schema in a SQL IDE - but in this case you're executing a statement (SELECT * FROM ...) rather than requesting metadata through the JDBC api? If so, that's a potential workaround but it would require a software change to Stardog.

I think contacting AWS is a good approach. We're calling the getColumns on their implementation of java.sql.DatabaseMetaData when the NPE occurs.

True, I used a select query to access the information schema but thought the IDE (Pycharm/Jetbrains) might be using the JDBC api to introspect the table schemas. AWS is on the case and will report back what I find.

Quick update - AWS hasn't been able to replicate the issue. Do you have access to an AWS environment where you can try to replicate on your side?