Hive
You are looking at an older version of the documentation. The latest version is found here.
Apache Hive is a data warehouse infrastructure which provides query, data summarization, and analysis, built on top of Hadoop. The Apache Hive data warehouse software facilitates writing, reading, and managing large datasets with distributed storage using SQL. A JDBC driver and a command line tool are provided to connect users to Hive.
Type Name
hive
Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale-out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware. Hive has limited support for data types as it supports integer variants, boolean, float, double, and string. It does not have native support for time-based types, XML, or LOBs. These limitations are reflected in the connector capabilities. The view table can use these types, however, the transformation would need to specify the necessary transformations. Note that in those situations, the evaluations will be done in the Data Virtuality Server engine. Another limitation of Hive is that it only supports EQUI join, so using any other join types on its source tables will result in inefficient queries. To write criteria based on partitioned columns, they can be modelled on the source table, but not included in selected columns.
Connection Properties
Template name: hive2
Appropriate translator name: hive
Properties:
host
(default:localhost
)port
(default:10000
)db
(default:default
)user-name
password
(default: empty)driver
(hive
for hive-jdbc-1.2.1,oldhive
for hive-jdbc-1.1.0; default:hive
)driver-class
(default:org.apache.hive.jdbc.HiveDriver
)jdbc-properties
(arbitrary extra properties)new-connection-sql
check-valid-connection-sql
(default:show databases
)min-pool-size
(default:2
)max-pool-size
(default:70
)
Translator Properties
Translator Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name | Description | Default value |
---|---|---|
comparisonStringConversion
| Sets a template to convert Examples
SQL
|
|
| Database time zone, used when fetching date, time, or timestamp values | System default time zone |
| Specific database version, used to fine-tune pushdown support | Automatically detected by the server through the data source JDBC driver, if possible |
forbidNonMatchingNames | Only considered when importer.tableNamePattern is set. When set to FALSE , allows creation of tables with names that do not match the tableNamePattern . Tables with non-matching names will only be accessible until the server restarts | TRUE |
ForceQuotedIdentifiers
| If | TRUE |
| Maximum size of prepared insert batch |
|
OrderByStringConversion | Sets a template to convert Examples
SQL
|
|
supportsConvertFromClobToString
| If TRUE , indicates that the translator supports the CONVERT /CAST function from clob to string |
|
| Forces a translator to issue a Example
SQL
|
|
supportsOrderByAlias
| If If |
|
supportsOrderByString
| If If |
|
TrimStrings | If |
|
| if |
|
| Embeds a / comment / leading comment with session/request id in the source SQL query for informational purposes |
|
The names of the translator properties are case-sensitive.
Translator Properties Specific for Hive
Name | Description | Default value |
---|---|---|
webhdfsEndpoint | Example: Please note that the hostname should be the node name of the Hadoop node which runs the WebHDFS service. It is not possible to use the IP address or some alternative DNS name. If the node name does not resolve on the Data Virtuality Server, please add node name with the correct IP address to the host file on the Data Virtuality server | |
webhdfsLocation | Example: webhdfsLocation="/user/hive" | |
tableType | Values: Example: PARQUET |
Data Source Properties
Data Source Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name | Description | Default |
---|---|---|
importer.autoCorrectColumnNames
| Replaces . in a column name with _ as the period character is not supported by the Data Virtuality Server in column names |
TRUE
|
importer.defaultSchema
|
Please note that writing into a data source is only possible if this parameter is set. | Empty |
importer.enableMetadataCache | Turns on metadata cache for a single data source even when the global option is turned off. Together with importer.skipMetadataLoadOnStartup=true , it allows using materialized views after server restart when the original source is unavailable |
FALSE
|
importer.excludeProcedures
| Case-insensitive regular expression that will exclude a matching fully qualified procedure name from import | Empty |
importer.excludeSchemas
| Comma-separated list of schemas (no % or ? wildcards allowed) to exclude listed schemas from import. A schema specified in defaultSchema or schemaPattern will be imported despite being listed in excludeSchemas . Helps to speed up metadata loading | Oracle:
SQL
All others: empty |
| Case-insensitive regular expression that will exclude a matching fully qualified table name from import. Does not speed up metadata loading Examples
CODE
| Empty |
| Fetch size assigned to a resultset on loading metadata | No default value |
| If set to |
|
importer.importIndexes
| If set to TRUE , imports index/unique key/cardinality information |
FALSE
|
importer.importKeys
| If set to TRUE , imports primary and foreign keys |
FALSE
|
| If set to Please note that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for |
|
importer.loadColumnsTableByTable | Set to TRUE to force table by table metadata processing | FALSE /TRUE only for Netsuite and SAP Advantage Database Server |
importer.loadMetadataWithJdbc
| If set to TRUE , turns off all custom metadata load ways |
FALSE
|
importer.procedureNamePattern
| Procedure(s) to import. If omitted, all procedures will be imported. % as a wildcard is allowed: for example, importer.procedurePatternName=foo% will import foo , foobar , etc. W orks only in combination with importProcedures | Empty |
| If set to |
|
importer.renameDuplicateColumns
| If set to TRUE , renames duplicate columns caused by either mixed case collisions or autoCorrectColumnNames replacing . with _ . The suffix _n where n is an integer will be added to make the name unique |
TRUE
|
importer.renameDuplicateTables
| If set to TRUE , renames duplicate tables caused by mixed case collisions. The suffix _n where n is an integer will be added to make the name unique |
TRUE
|
importer.replaceSpecSymbsInColNames | If set to TRUE , replaces all special symbols (any symbols not in the ^A-Za-z0-9_ sequence) to the _ symbol in column names of tables |
FALSE
/
TRUE
only for BigQuery |
importer.schemaPattern
| Schema(s) to import. If omitted or has "" value, all schemas will be imported. % as wildcard is allowed: for example, importer.schemaPattern=foo% will import foo , foobar , etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed within double quotes:
importer.schemaPattern="schema1,schema2,pattern1%,pattern2%" . For proper escaping of special characters depending on the type of data source, check Escaping special characters in schema names or use wildcards instead: "[schema_name]" can be rewritten as "%schema%name%" . Helps to speed up metadata loading | Empty |
importer.skipMetadataLoadOnStartup
| If set to |
FALSE
|
importer.tableNamePattern
| Table(s) to import. If omitted, all tables will be imported. % as a wildcard is allowed: for example, importer.tableNamePattern=foo% will import foo , foobar , etc | Empty |
importer.tableTypes
| Comma-separated list (without spaces) of table types to import. Available types depend on the DBMS. Usual format: Other typical types are | Empty |
importer.useCatalogName
| If set to TRUE , uses any non-null/non-empty catalogue name as part of the name in source, e.g. "catalogue"."table"."column" , and in the Data Virtuality Server runtime name if useFullSchemaName is TRUE . If set to FALSE , will not use the catalogue name in either the name in source or the Data Virtuality Server runtime name. Should be set to FALSE for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL | TRUE / FALSE only for Hive |
importer.useFullSchemaName
| If set to Please note that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception |
TRUE
|
importer.useProcedureSpecificName
| If set to TRUE , allows the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in the Data Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names |
FALSE
|
importer.widenUnsignedTypes
| If set to TRUE , converts unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte |
TRUE
|
Using a materialized view when the original source is not available at the time the server starts is possible since v2.4.29
Data Source Properties Specific for Hive
Name | Description | Default value |
---|---|---|
importer.useDatabaseMetadata | For Hive 0.13.0 and later, the normal JDBC DatabaseMetaData facilities are sufficient to perform an import. Set to TRUE to use the normal JDBC import logic. Set to FALSE to use native Hive methods to import metadata. When TRUE , trimColumnNames has no effect | FALSE |
importer.trimColumnNames | For Hive 0.11.0 and later, metadata for the DESCRIBE command could be inappropriately returned with padding. Set to TRUE to trim white spaces from column names | FALSE |