Azure Synapse Analytics (Former SQL Data Warehouse) Connector
Type Name
azure
Connection Properties
Template name: azure
Appropriate translator name: azure
Properties:
host
port
(default:1433
)db
(default: empty)jdbc-properties
(default:encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;
)user-name
password
storageAccountKey
(for blob upload mode)stringDelimiter
driver
(default:com.microsoft.sqlserver
)driver-class
(default:com.microsoft.sqlserver.jdbc.SQLServerDriver
)new-connection-sq
l (default: empty)check-valid-connection-sql
(default:select 1
)min-pool-size
(default:2
)max-pool-size
(default:70
)cloudAgent
(default:FALSE
)
Upload Mode
The Azure Synapse Analytics (Former SQL Data Warehouse) connector supports two different upload modes: Standard (default) and Blob. As the Standard mode is very slow, we do not recommend using it, except for simple test scenarios. For better performance, we recommend the Blob mode which uses Azure Blob Storage for uploading and copying data into the target table. In this section, we describe how to set up the Blob mode and how to use it.
Automatic Setup
Normally, the connector is set up automatically, with the Data Virtuality Server performing three steps: first, it creates a master key, then it creates the database for credentials, and finally, it defines an external file format depending on the stringDelimiter
property.
Please note that if at a later stage you will need to update one of the definitions (e.g. if the secret key has changed), the related object (e.g. the database for credentials) must be deleted first manually in Azure.
Manual Setup and Defining External File Format
If automatic setup described above is impossible for some reason, you will have to perform the three steps (creating a master key, creating the database for credentials, and creating an external file format depending on the stringDelimiter
property) manually.
The external file format is defined as follows:
CREATE EXTERNAL FILE FORMAT DVTextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '<stringDelimiter>',
USE_TYPE_DEFAULT = FALSE
)
);
The default value of the stringDelimiter
property is |. To change it, you can specify the value you need in both createConnection()
and createDatasource()
calls. For better performance, we recommend to avoid changing the value of the property after you have set the connector up and used it.
Copying Data using Azure Blob Storage
When the connector has been set up, you can copy the data using Azure Blob Storage as follows:
- Create an external data source - it depends on the storage account name and on the container name (needed once for each copy);
- Generate and upload a text file (blob) to the container in the storage (needed once for each copy);
- Create an external table and copy data from the blob into it;
- Copy data from the external table into the final target table.
Deleting and Recreating External File Format
Should the need arise, you can delete the external file format using this command:
CALL "<your Azure Synapse ds name>.native"(
"request" => 'DROP EXTERNAL FILE FORMAT DVTextFileFormat'
);;
After executing the command, just drop the Azure data source in the Data Virtuality Server and create it again.
Using Blob Storage with Managed Service Identity
If the Blob Storage has an MSI (Managed Service Identity) configured, a special CLI template (azure_msi) should be used to create the data source connection. There are several things to keep in mind:
- Translator parameter
useMSI
must be set toTRUE
; - Connector property
storageAccountKey
is not needed; - Translator property
storageAccountKey
must be set.
Distribution Modes
Azure Synapse Analytics supports two table distribution options: ROUND_ROBIN
(default) and HASH
. You can change the default distribution method when the table is created by providing an appropriate option (applicable to both CREATE TABLE
and SELECT INTO
commands).
Here are examples for both methods:
ROUND_ROBIN distribution
CREATE TABLE azure_blob.testdist ( col1 integer, col2 string )
OPTIONS (distribution 'ROUND_ROBIN') ;; --this line can be skipped
HASH distribution for CREATE TABLE command
CREATE table azure_blob.testdist ( col1 integer, col2 string )
OPTIONS (distribution 'hash(col1)') ;;
HASH distribution for SELECT INTO command
SELECT 1 AS col1, null AS col2, 'a' AS col3, null AS col4 INTO azure_blob.testdistselectinto
OPTIONS (distribution 'hash(col1)') ;;
Another way to change the distribution mode is to define the DISTKEY
index when materialized tables are created by the replicator. In our example, the materialized table will be created with a HASH
distribution on column a
:
CALL "SYSADMIN.createIndex"(
"recOptId" => 1,
"columnName" => 'a',
"indexKind" => 'DISTKEY'
) ;;
Indexing
Tables are created as heaps, thus without a clustered index. WITH (HEAP)
table option is used on table creation. To add an index, use the native()
procedure or do it directly in the database.
String and BLOB size limitations
When the Blob upload mode is used, String and BLOB values are limited to 500 kilobytes due to blob storage limitations. For the standard upload mode, the limitation is set to 500,000 symbols.
Usage Examples
Standard Upload Mode
CALL SYSADMIN.createConnection(name => 'azure_std', jbossCLITemplateName => 'azure', connectionOrResourceAdapterProperties => 'db=<bd>,user-name=<user-name>,password=<password>,host=<host>,port=1433') ;;
CALL SYSADMIN.createDataSource(name => 'azure_std', translator => 'azure', modelProperties => 'importer.useFullSchemaName=false,importer.TableTypes="TABLE,VIEW",importer.schemaPattern=dbo,importer.importIndexes=false,importer.defaultSchema=dbo', translatorProperties => 'supportsNativeQueries=true') ;;
Blob Upload Mode
For the two examples that follow, please note that the storageAccountKey
must be provided both in createConnection()
and createDatasource()
procedure calls.
Blob Upload Mode
CALL SYSADMIN.createConnection(name => 'azure_blob', jbossCLITemplateName => 'azure', connectionOrResourceAdapterProperties => 'db=<db>,user-name=<user-name>,password=<password>,host=<host>,port=1433,storageAccountKey=<storageAccountKey>') ;;
CALL SYSADMIN.createDataSource(name => 'azure_blob', translator => 'azure', modelProperties => 'importer.useFullSchemaName=false,importer.TableTypes="TABLE,VIEW",importer.schemaPattern=dbo,importer.importIndexes=false,importer.defaultSchema=dbo', translatorProperties => 'supportsNativeQueries=true,uploadMode=BLOB,storageAccountName=dvblobtest,storageAccountKey=<storageAccountKey>,defaultEndpointsProtocol=https,deleteContainer=false,tempFolder=D:\tmp\azure') ;;
If tempFolder
is not specified, the default OS tmp folder is used.
Blob Upload Mode with Managed Service Identity
CALL SYSADMIN.createConnection(name => 'azure_msi', jbossCLITemplateName => 'azure_msi', connectionOrResourceAdapterProperties => 'db=<db>,user-name=<user-name>,password=<password>,host=<host>,port=1433,stringDelimiter=|') ;;
CALL SYSADMIN.createDataSource(name => 'azure_msi', translator => 'azure', modelProperties => 'importer.useFullSchemaName=false,importer.TableTypes="TABLE,VIEW",importer.schemaPattern=dbo,importer.importIndexes=false,importer.defaultSchema=dbo', translatorProperties => 'supportsNativeQueries=true,uploadMode=BLOB,useMSI=true,storageAccountName=dvsecuredstorage,storageAccountKey=<storageAccountKey>,defaultEndpointsProtocol=https,deleteContainer=true,stringDelimiter=|') ;;