
Apache Hive
Hive provides out of the box support for Apache Iceberg Tables, a cloud-native, high-performance open table format, via Hive StorageHandler. Learn More
Apache Hive : LanguageManual DDL
Managed and External Tables. By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. For details on the differences between managed and external table see Managed vs. External Tables. Storage Formats. Hive supports built-in and custom-developed file formats.
Apache Hive : Managed vs. External Tables
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables. External tables. An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive.
Apache Hive : LanguageManual DML
Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data. Starting with Hive 0.13.0 , the select statement can include one or more common table expressions (CTEs) as shown in the SELECT syntax .
Apache Hive : LanguageManual Select
To specify a database, either qualify the table names with database names ("db_name.table_name" starting in Hive 0.7) or issue the USE statement before the query statement (starting in Hive 0.6). “ db_name.table_name ” allows a query to access tables in different databases.
Apache Hive : Hive Transactions
Table Properties; Talks and Presentations; Hive 3 Warning. Any transactional tables created by a Hive version prior to Hive 3 require Major Compaction to be run on every partition before upgrading to 3.0.
Apache Hive : LanguageManual ORC
Using ORC files improves performance when Hive is reading, writing, and processing data. Compared with RCFile format, for example, ORC file format has many advantages such as: a single file as the output of each task, which reduces the NameNode’s load; Hive type support including datetime, decimal, and the complex types (struct, list, map ...
Apache Hive : AdminManual Configuration
Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table’s filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table.
Apache Hive : LanguageManual Joins
Hive converts joins over multiple tables into a single map/reduce job if for every table the same column is used in the join clauses e.g. SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key1)
Apache Hive : Hive on Spark
A Hive table is nothing but a bunch of files and folders on HDFS. Spark primitives are applied to RDDs. Thus, naturally Hive tables will be treated as RDDs in the Spark execution engine.