Parquet Connector
You are looking at an older version of the documentation. The latest version is found here.
The Parquet connector enables exporting data in Parquet format to the local filesystem.
Parquet Connector Data Source Creation
CALL SYSADMIN.createConnection(name => <parquetalias>, jbossCLITemplateName => 'ufile', connectionOrResourceAdapterProperties => 'ParentDirectory="directory"') ;;
CALL SYSADMIN.createDataSource(name => <parquetalias>, translator => 'parquet', modelProperties => null, translatorProperties => null) ;;
Model Properties
Name | Description | Default value |
---|---|---|
importer.loadMetadata | When set to TRUE , the data source will load the metadata of the tables that were present in the folder prior to data source creation | FALSE |
Usage
Data is exported using the SELECT INTO
command:
SELECT *
INTO <parquet data source name>.<table name>
FROM ...
The data will be exported into the folder specified in the path connection property. The table is represented by a folder named according to the following pattern: <parquet data source name>_<table name>.parquet
. The folder contains files named like <table name>_<UID>.parquet
. When new data is inserted into a table, a new file is created in the respective table folder with new data appended to the old data.
You can also create a table using the CREATE TABLE
statement. However, the physical file will only be created when some data is inserted into this table using the INSERT VALUES
or INSERT SELECT
statement.
Example
CALL SYSADMIN.createConnection(name => 'parquet_1', jbossCLITemplateName => 'ufile', connectionOrResourceAdapterProperties => 'ParentDirectory="/home/exportuser/examples"') ;;
CALL SYSADMIN.createDataSource(name => 'parquet_1', translator => 'parquet', modelProperties => 'importer.loadMetadata=true', translatorProperties => null) ;;
SELECT *
INTO parquet_1.example_salesorderdetail
FROM adventurework.salesorderdetail ;;
As a result of this call, the content of the salesorderdetail table in the adventureworks schema will be exported into a file named something like example_salesorderdetail_1e04e8d5-f963-11ed-a1bc-0a0027000003.parquet in the /home/exportuser/examples/parquet_1.example_salesorderdetail.parquet folder.
See Also
Parquet File Creation and S3 Storage with Data Virtuality to learn how to take any data source table and create a local Parquet file.
Query Parquet Files in Data Virtuality Using Amazon Athena for information on how to read from Parquet.
Since v.3.9:
- ufile jbossCLITemplateName is used for creating Parquet data sources;
importer.loadMetadata
model property is available;- Tables are stored in dedicated folders;
- Files are not re-written when inserting data;
- Reading from Parquet tables is possible.