Azure Table Connector

The Azure Table Connector is used to access Azure Tables data and make it available in the CData Virtuality Server in the relational form. This way, you can query an Azure Table using the SQL query language and write complex queries directed to multiple Data Sources. For example, a single SQL query can join tables from MySQL and combine them with data coming from Azure Table.

Connector Configuration

Connecting to Azure Table

Before issuing SQL queries to the Azure Table, you need to represent Azure Table's Metadata in relational format, for example represent non-relation collections as relation tables. This is done by connecting to Azure Table data source using the createConnection/createDatasource commands.

The complete list of connection properties can be found in Connection Configuration Interface under azuretable.

The Azure Table endpointSuffix connection property is explained below.

Example

CODE

CALL "SYSADMIN.createConnection"(
    "name" => 'azuretable',
    "jbossCLITemplateName" => 'azuretable',
    "connectionOrResourceAdapterProperties" => 
		'accountName=<accountName>
		,accountKey=<accountKey>
		,defaultEndpointsProtocol=https
		,endpointSuffix=<suffix>'
) ;;

CALL "SYSADMIN.createDataSource"(
    "name" => 'azuretable',
    "translator" => 'azuretable',
    "modelProperties" => 'importer.useFullSchemaName=false,importer.defaultSchema=<accountName>,importer.schemaPattern=<accountName>',
    "translatorProperties" => ''
) ;;

Azure Table endpointSuffix

Azure domain name (e.g. core.windows.net)

Usage

Azure Tables are represented as relational tables in the schema named the same as in createDatasource().

SELECT Query

To query these tables, a regular SQL SELECT query can be used:

CODE

 SELECT * FROM azuretable.table1

Write Capability

Azure Table Storage table names must be alpha-numeric and must not contain any special characters (underscore is not allowed);
Any Table in Azure Table Storage has system columns ("partitionkey", "rowkey", "time_stamp");
The column combination of ("partitionkey", "rowkey") has to be unique for each table in Azure Table Storage;
The name of the time_stamp column is "time_stamp" in the CData Virtuality Server metadata and "timestamp" in Azure Table Storage metadata;
Column order in the CData Virtuality Server metadata is as follows: "partitionkey", "rowkey", "time_stamp". All other columns are ordered alphabetically;
Empty tables in Azure Table Storage have no metadata in the Azure Table Storage. The CData Virtuality Server will not load such tables on data source refresh.

CREATE TABLE / INSERT INTO

CREATE table may specify system columns as ("partitionkey" string(1024), "rowkey" string(1024), time_stamp timestamp);
INSERT INTO must contain unique values for ("partitionkey", "rowkey") column combination;
Target columns in INSERT INTO queries have to be specified explicitly;
The timestamp column is maintained by Azure Table Storage. INSERT INTO and SELECT INTO may specify a value for the "time_stamp" column. This value will be ignored by Azure Table Storage.

SELECT INTO

SELECT INTO must contain unique values for ("partitionkey", "rowkey") column combination;
The timestamp column is maintained by Azure Table Storage. INSERT INTO and SELECT INTO may specify a value for the "time_stamp" column. This value will be ignored by Azure Table Storage.

Data Types

The CData Virtuality Server BLOB data type columns (Snowflake's VARBINARY) may be explicitly cast to STRING for inserting them into Azure Table Storage;
Numeric CData Virtuality Server data types BIGDECIMAL/BIGINTEGER will be created as data type string in Azure Table Storage. If that is not desired, they may be explicitly cast to DOUBLE or INTEGER/LONG in the CData Virtuality Server before being written to Azure Table Storage.

Recommendations

Avoid table creation without specifying system columns;
Refresh the Azure Table Storage before issuing a SELECT INTO or CREATE TABLE query. The connection is not kept alive and needs to be activated if idle for longer than one hour.

Further Hints

Technically, tables without ("partitionkey", "rowkey", "time_stamp") can be created.
- If such a table is created via SELECT INTO, only one row may be written. That is because metadata columns will be added in Azure Table Storage automatically with values (null, null) for ("partitionkey", "rowkey") column combination and one row containing that value will disallow the presence of any other due to uniqueness constraint.
- If such a table is created via CREATE TABLE, exactly one row may be inserted, for the same reason as stated above;
Drop operation could take up to 40 seconds due to internal treatment of drop requests in Azure Table Storage;
Upsert and History update procedures do neither work nor make sense towards Azure Table Storage. Any such workflows should be covered by a customized approach.

Translator Properties for Azure Tables Connector

numberOfThreads:
- Controls the number of threads being used to write to Azure Table Storage
- Sefault: 3
- Reliable values: up to 15
sampleSize:
- Defines number of rows that will be used to determine column type
- Default: 100
supportsWhereColumnEqualsColumn:
- Controls pushdown of queries like WHERE col1 = col2
- Default: FALSE