Object storage file formats#
Object storage connectors support one or more file formats specified by the underlying data source.
In the case of serializable formats, only specific SerDes are allowed:
RCText - RCFile
ColumnarSerDe
RCBinary - RCFile
LazyBinaryColumnarSerDe
JSON -
org.apache.hive.hcatalog.data.JsonSerDe
CSV -
org.apache.hadoop.hive.serde2.OpenCSVSerde
ORC format configuration properties#
The following properties are used to configure the read and write operations with ORC files performed by supported object storage connectors:
Property Name |
Description |
Default |
---|---|---|
|
Sets the default time zone for legacy ORC files that did not declare a time zone. |
JVM default |
|
Access ORC columns by name. By default, columns in ORC files are
accessed by their ordinal position in the Hive table definition. The
equivalent catalog session property is |
|
|
Enable bloom filters for predicate pushdown. |
|
|
Allow reads on ORC files with short zone ID in the stripe footer. |
|
Parquet format configuration properties#
The following properties are used to configure the read and write operations with Parquet files performed by supported object storage connectors:
Property Name |
Description |
Default |
---|---|---|
|
Adjusts timestamp values to a specific time zone. For Hive 3.1+, set this to UTC. |
JVM default |
|
Access Parquet columns by name by default. Set this property to
|
|
|
Percentage of parquet files to validate after write by re-reading the whole file.
The equivalent catalog session property is |
|
|
Maximum page size for the Parquet writer. |
|
|
Maximum row group size for the Parquet writer. |
|
|
Maximum number of rows processed by the parquet writer in a batch. |
|
|
Whether bloom filters are used for predicate pushdown when reading
Parquet files. Set this property to |
|
|
Skip reading Parquet pages by using Parquet column indices. The
equivalent catalog session property is |
|
|
Sets the maximum number of rows read in a batch. The equivalent catalog
session property is named |
|
|
Data size below which a Parquet file is
read entirely. The equivalent catalog session property is named
|
|