MPP Data Source Configuration
The following guide explains how to configure the Hive connector to read Parquet files from Google Cloud Storage (GCS), AWS S3, and Azure Blob Storage. These steps also apply to the Delta Lake and Iceberg connectors - simply replace the connector name with delta_lake
or iceberg
, respectively.
How to Read Parquet Files from Google Cloud Storage (GCS)
Step 1: Create a Catalog
Run the following SQL command, replacing your GCP project ID, key file path, and Hive metastore details:
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE CATALOG gcs USING hive
WITH (
"gcs.project-id"=''your-project-id'',
"gcs.json-key-file-path"=''path-to-your-service-account-key.json'',
"hive.metastore.uri"=''thrift://hive-metastore:9083''
)'
)) AS s;
Step 2: Create a Table
Replace the bucket name and path with your specific Google Cloud Storage bucket and file path:
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE TABLE gcs.default.weather_data (...)
WITH (
"external_location" = ''gs://your-bucket-name/path/to/data/'',
"format" = ''PARQUET''
)'
)) AS s;
How to Read Parquet Files from AWS S3
Step 1: Create a Catalog
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE CATALOG s3 USING hive
WITH (
"hive.s3.aws-access-key"=''your-access-key'',
"hive.s3.aws-secret-key"=''your-secret-key'',
"hive.metastore.uri"=''thrift://hive-metastore:9083''
)'
)) AS s;
Step 2: Create a Table
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE TABLE s3.default.weather_data (...)
WITH (
"external_location" = ''s3://your-bucket-name/path/to/data/'',
"format" = ''PARQUET''
)'
)) AS s;
Azure Blob Storage
Step 1: Create a Catalog
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE CATALOG azure USING hive
WITH (
"hive.azure.storage.account"=''your-storage-account'',
"hive.azure.storage.key"=''your-storage-key'',
"hive.metastore.uri"=''thrift://hive-metastore:9083''
)'
)) AS s;
Step 2: Create a Table
SELECT CAST(s.tuple AS string) FROM (CALL "mpp.native"(
"request" => 'CREATE TABLE azure.default.weather_data (...)
WITH (
"external_location" = ''<https://your-storage-account.blob.core.windows.net/container-name/path/'',>
"format" = ''PARQUET''
)'
)) AS s;