HIVE-29578: Iceberg: add support for logical views#6449
Conversation
4fdad42 to
252c608
Compare
252c608 to
e10eba5
Compare
e10eba5 to
96fa476
Compare
96fa476 to
114412a
Compare
| result.setLastAccessTime(nowSec); | ||
| result.setRetention(Integer.MAX_VALUE); | ||
|
|
||
| boolean hiveEngineEnabled = false; |
There was a problem hiding this comment.
What is hiveEngineEnabled and why is it false?
There was a problem hiding this comment.
hiveEngineEnabled switches how HiveOperationsBase.storageDescriptor fills the Storage Desacriptor: with HiveIcebergInputFormat / HiveIcebergOutputFormat / HiveIcebergSerDe when true, or the usual placeholder FileInputFormat / FileOutputFormat / LazySimpleSerDe when false.
Why it’s false in toHiveView:
This path materializes an HMS VIRTUAL_VIEW for REST catalog that expose Iceberg view metadata through the HMS API. That row isn’t meant to drive a Hive table scan the way a real Iceberg table commit does; execution still comes from the view definition / catalog, not from wiring Iceberg MR formats on the stub. HiveViewOperations does the same thing (hiveEngineEnabled = false).
So we keep a minimal SD consistent with normal virtual views and avoid implying this HMS object is an Iceberg-backed table for the Hive engine. For tables, HiveTableOperations still turns engine integration on/off via metadata + ConfigProperties.ENGINE_HIVE_ENABLED where that actually matters.
| create view v_ice as select * from src_ice stored by iceberg; | ||
|
|
||
| select * from v_ice; | ||
|
|
There was a problem hiding this comment.
Could you please add
- logical view which does some transformation on it's base table and query from it?
- create views when the schema is specified and not specified.
There was a problem hiding this comment.
logical view which does some transformation on it's base table and query from it?
This is not supported by Hive itself:
update v_ice set last_name = last_name + 'a'
fname=iceberg_native_view.q
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
org.apache.hadoop.hive.ql.parse.SemanticException: You cannot update or delete records in a view
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.validateTargetTable(RewriteSemanticAnalyzer.java:265)
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:73)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:234)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:790)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:760)
at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:139)
create views when the schema is specified and not specified.
Done
There was a problem hiding this comment.
logical view which does some transformation on it's base table
Sorry I mean something like
select first_name || last_name from ... where <some filter condition>
because
select * from table;
as a view definition is a kind of edge case. It is ok for testing but not a typical use-case.
| break; | ||
| } | ||
| } | ||
| boolean icebergNativeView = validateOptionalViewStorageClause(storageClause); |
There was a problem hiding this comment.
Please do not hardcode anything like Iceberg into compiler code. The compiler is independent from the storage handler. I'm aware that we already hove lots of code which violates this principal and it already causes lots of troubles.
There was a problem hiding this comment.
Fixed - moved all Iceberg-specific code into HiveIcebergStorageHandler and kept generic interfaces in the Compiler.
| private static final long serialVersionUID = 1L; | ||
|
|
||
| /** HMS table property set when the view is declared with {@code STORED BY ICEBERG} (native Iceberg view). */ | ||
| public static final String ICEBERG_NATIVE_VIEW_PROPERTY = "hive.iceberg.native.view"; |
There was a problem hiding this comment.
Please remove this from here.
| private final boolean ifNotExists; | ||
| private final boolean replace; | ||
| private final List<FieldSchema> partitionColumns; | ||
| private final boolean icebergNativeView; |
There was a problem hiding this comment.
Please remove this from here.
| @Explain(displayName = "iceberg native view", displayOnlyOnTrue = true) | ||
| public boolean isIcebergNativeView() { | ||
| return icebergNativeView; | ||
| } |
There was a problem hiding this comment.
Please remove this from here.
| VIEW_STORAGE_HANDLER_UNSUPPORTED(10448, "CREATE VIEW only supports STORED BY ICEBERG for native " | ||
| + "Iceberg views; unsupported storage clause: {0}", true), |
There was a problem hiding this comment.
Please rephrase this error message. Remove STORED BY and let Iceberg be a parameter.
| TableName.fromString( | ||
| view.name(), MetaStoreUtils.getDefaultCatalog(conf), Warehouse.DEFAULT_DATABASE_NAME); | ||
| result.setCatName(tableName.getCat()); | ||
| result.setDbName(tableName.getDb()); |
There was a problem hiding this comment.
What happens when a custom db is specified?
create view my_db.myview as...
There was a problem hiding this comment.
this method is only called when reading from a view using a REST Catalog - it is called from HiveRESTCatalogClient.getTable(GetTableRequest tableRequest)
There was a problem hiding this comment.
Sorry, my question was inspired by the fact that the database name is hardcoded, which makes it impossible to handle views in databases other than 'default'.
this method is only called when reading from a view using a REST Catalog - it is called from
HiveRESTCatalogClient.getTable(GetTableRequest tableRequest)
Does this imply that I am unable to reference views from any database other than default ?
select * from my_db.my_view;
[difin] - there was no option to reply on this comment, see reply below.
| return conf; | ||
| } | ||
|
|
||
| private HiveCatalog verifyCatalog() { |
There was a problem hiding this comment.
Why is this called verify? Based on the implementation of this method it is more like loading a catalog.
Could you please share some background of the use case of this method.
There was a problem hiding this comment.
Didn't notice, fixed.
| conf, DB, VIEW, cols, "select 2 as id", null, null, true, false)) | ||
| .isTrue(); | ||
|
|
||
| assertThat(verifyCatalog().viewExists(TableIdentifier.of(DB, VIEW))).isTrue(); |
There was a problem hiding this comment.
I think it worth checking if the view definition is actually altered.
| ('fn7','ln7', 2); | ||
|
|
||
| ---------------------------------------------------------------- | ||
| -- Iceberg native view via TBLPROPERTIES before AS |
There was a problem hiding this comment.
Does before AS adds any value in this comment? AFAIK the grammar allows this way only.
There was a problem hiding this comment.
Removed "before as" part.
| iceberg_create_locally_zordered_table.q,\ | ||
| iceberg_merge_delete_files.q,\ | ||
| iceberg_merge_files.q,\ | ||
| iceberg_native_view.q,\ |
There was a problem hiding this comment.
Is there any LLAP specific in view DDLs ? If not; can this test run by the default Iceberg driver?
There was a problem hiding this comment.
No, this test is not LLAP-specific. Changed to run it by TestIcebergCliDriver instead of TestIcebergLLAPLocalCliDriver.
| @Test | ||
| public void testParseCreateViewTblpropertiesViewFormatIceberg() throws Exception { | ||
| ASTNode tree = parseDriver.parse( | ||
| "create view v1 tblproperties ('view-format'='iceberg') as select * from t", null).getTree(); | ||
| assertTrue(tree.dump(), tree.toStringTree().contains("tok_createview")); | ||
| assertTrue(tree.dump(), tree.toStringTree().contains("tok_tableproperties")); | ||
| assertTrue(tree.dump(), tree.toStringTree().contains("view-format")); | ||
| assertTrue(tree.dump(), tree.toStringTree().contains("iceberg")); | ||
| } | ||
|
|
There was a problem hiding this comment.
This test class is about the default keyword testing. Could you please move this test to the one which is about view creation. Please create a new one of not exists. Since this grammar is not Iceberg specific it can be a generic one.
There was a problem hiding this comment.
Done, moved to a new class for testing view parsing.
| if (explicitViewFormat) { | ||
| throw new SemanticException(ErrorMsg.VIEW_STORAGE_HANDLER_UNSUPPORTED.getMsg( | ||
| "Native view metadata is not supported for storage handler: " + handlerClass)); | ||
| } |
There was a problem hiding this comment.
We have the knowledge about explicit view format at the caller.
Could you please move this check to the caller method.
There was a problem hiding this comment.
Done, moved to the caller.
| if (desc.usesNativeViewCatalog()) { | ||
| executeNativeCatalogView(); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
I think this decision should be moved to BaseHiveIcebergMetaHook or HiveIcebergMetaHook.preCreateTable like we do in case of Iceberg tables.
Please set the StorageHandlerClass in CreateViewOperation.createViewObject to the newly created ql.metadata.Table object because later this object is passed to HMS and the meta hook.
if (desc.usesNativeViewCatalog()) {
storageFormat.setStorageHandler(desc.getNativeViewStorageHandlerClass());
view.setProperty(
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_STORAGE,
desc.getNativeViewStorageHandlerClass());
}
I'm not a big fan of this meta hook solution but this is whet we have in case of Iceberg tables and IMHO it is better to be consistent.
Probably create or replace and if not exists doesn't have to be handled separately in case of Iceberg views of you make this change.
Please add some tests for create or replace and if not exists to the q test iceberg_native_view.q
There was a problem hiding this comment.
Done - moved view creation to the hook and added test cases with create or replace and if not exists to iceberg_native_view.q
| } | ||
|
|
||
| /** | ||
| * Resolves {@code STORED BY identifier} for CREATE VIEW (short names such as {@code ICEBERG} or an FQCN). |
There was a problem hiding this comment.
Was STORED BY left here intentionally?
There was a problem hiding this comment.
No, it was a leftover. Fixed.
| * Keys should be removed when {@linkplain #clearNativeViewHmsTableProperties(Map)} is invoked for the same | ||
| * handler class recorded under {@link Constants.NATIVE_VIEW_STORAGE_HANDLER_CLASS_PARAM}. | ||
| */ | ||
| default Map<String, String> getNativeViewHmsTableProperties() { |
There was a problem hiding this comment.
Can we call this simply getViewProperties? I would like to understand what native and hms mean in this context.
AFAIK, a native object is one that doesn't have a storage handler. To me, the word native in these method names is misleading. Could you please elaborate on this a bit?"
getNativeViewHmsTableProperties
clearNativeViewHmsTableProperties
createOrReplaceNativeView
supportsNativeViewCatalog
There was a problem hiding this comment.
I agree that the word "native" was misleading outside of Iceberg handler.
Removed/rephrased the "iceberg native views" names that were used outside of the iceberg handler.
1886216 to
4743d79
Compare
|
|
@difin Please don't squash the commits when addressing review comments. It makes review more difficult. |
| assertThat( | ||
| NativeIcebergViewSupport.createOrReplaceNativeView( | ||
| conf, DB, VIEW, cols, "select 2 as id", null, null, false, true)) | ||
| .isFalse(); |
There was a problem hiding this comment.
Could you assert that the definition query hasn't changed?
| --! Native Iceberg catalog view with default Iceberg storage handler (REST table with catalog in props) | ||
| ----------------------------------------------------------------------------------------------- | ||
|
|
||
| create view ice_v4 as select * from ice_orc2; | ||
| select * from ice_v4; | ||
| desc formatted ice_v4; | ||
| drop view ice_v4; |
There was a problem hiding this comment.
Thanks for the clarification. Is it mean that the view ice_v4 is created in the ice01 catalog too ?
Does creating views in one catalog referencing base tables from another catalog?
[difin] - there was no option to reply on this comment, see reply below.
| TableName.fromString( | ||
| view.name(), MetaStoreUtils.getDefaultCatalog(conf), Warehouse.DEFAULT_DATABASE_NAME); | ||
| result.setCatName(tableName.getCat()); | ||
| result.setDbName(tableName.getDb()); |
There was a problem hiding this comment.
Sorry, my question was inspired by the fact that the database name is hardcoded, which makes it impossible to handle views in databases other than 'default'.
this method is only called when reading from a view using a REST Catalog - it is called from
HiveRESTCatalogClient.getTable(GetTableRequest tableRequest)
Does this imply that I am unable to reference views from any database other than default ?
select * from my_db.my_view;
[difin] - there was no option to reply on this comment, see reply below.
|
|
||
| @Override | ||
| public void commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) { | ||
| if (BaseHiveIcebergMetaHook.isNativeIcebergLogicalView(hmsTable)) { |
There was a problem hiding this comment.
nit.: this doesn't have to be qualified because isNativeIcebergLogicalView is defined in the super class.
There was a problem hiding this comment.
Thanks, didn't notice.
Fixed.
| boolean replace = | ||
| Boolean.parseBoolean( | ||
| SessionStateUtil.getProperty(conf, Constants.EXTERNAL_LOGICAL_VIEW_DDL_REPLACE).orElse("false")); | ||
| boolean ifNotExists = | ||
| Boolean.parseBoolean( | ||
| SessionStateUtil.getProperty(conf, Constants.EXTERNAL_LOGICAL_VIEW_CREATE_IF_NOT_EXISTS) | ||
| .orElse("false")); |
There was a problem hiding this comment.
nit.: .orElse("false") goes to new line or not. Both are good but please be consistent.
There was a problem hiding this comment.
I put the second orElse it on a new line because of the 120 characters limit.
Moved the 1st one on a new line too for the consistency.
| if (!desc.usesStorageHandler()) { | ||
| // External logical view is replaced with a native Hive view | ||
| clearStorageHandlerProp(oldview); | ||
| } |
There was a problem hiding this comment.
Could you please share some repro steps for this use-case?
There was a problem hiding this comment.
Added this test case to iceberg_native_logical_view.q:
v_ice before replacement is Iceberg-native logical view that is replaced with Hive-native logical view:
-----------------------------------------------------------------------------------------
-- Replace Iceberg logical view with a Hive-native logical view
-----------------------------------------------------------------------------------------
create or replace view v_ice
as select first_name from src_ice where dept_id = 2;
select * from v_ice;
desc formatted v_ice;
drop view v_ice;
| SessionStateUtil.addResource(context.getConf(), Constants.EXTERNAL_LOGICAL_VIEW_DDL_REPLACE, | ||
| Boolean.toString(replace)); | ||
| SessionStateUtil.addResource(context.getConf(), Constants.EXTERNAL_LOGICAL_VIEW_CREATE_IF_NOT_EXISTS, | ||
| Boolean.toString(ifNotExists)); | ||
| } |
There was a problem hiding this comment.
Why does these has to be added to session state?
IIUC replace and ifNotExists is handled here in CreateViewOperation
There was a problem hiding this comment.
The values of replace and ifNotExists are added to the session state because they are needed when creating a view using Iceberg API, which happens in IcebergNativeLogicalViewSupport.createOrReplaceNativeView(). This method is called from HiveIcebergMetaHook (when using HMS) or from HiveRESTCatalogClient (when using REST Catalog).
4743d79 to
2d5f0d0
Compare
Renamed Iceberg Native Views to Iceberg Logical Views in the PR description and JIRA. |
Yes,
|
No. That line does not restrict views to the default database. The same pattern exists in Iceberg logical views are tested with non-default database, see for example |
2d5f0d0 to
f3ddf97
Compare
|



What changes were proposed in this pull request?
Added support for Iceberg logical views in Hive for both HMS and REST catalogs.
Why are the changes needed?
To support Iceberg logical views. This can be especially useful for REST Catalog clients.
Does this PR introduce any user-facing change?
Yes, new HQL syntax:
create view <view_name> as select * from <src_tbl> stored by iceberg;How was this patch tested?
Created new and updated exiting unit and integration tests with Iceberg logical views test cases.