It is highly recommended to specify a bucketName when creating a BigQuery data source because of the following:
a bucket creation limit that can be exceeded while performing multiple data inserts. Please refer to the Google Cloudstorage documentation for detailed information;
region mismatch between a BigQuery data set and a temporary bucket which may cause errors while inserting data. Please, refer to the Google BigQuery documentation for more details.
Translator Properties
Translator Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name
Description
Default value
comparisonStringConversion
Sets a template to convert WHERE clauses on strings
Examples
SQL
comparisonStringConversion=%s
-- no conversion will be applied
comparisonStringConversion=binary %s
-- WHERE binary someStringExpression LIKE someOtherStringExpression
comparisonStringConversion=(%s COLLATE Latin1_General_CS_AS)
-- WHERE (someStringExpression COLLATE Latin1_General_CS_AS LIKE someOtherStringExpression)
MS SQL Server: (%s COLLATE Latin1_General_CS_AS)
MySQL, SingleStore: binary %s
Others: %s
DatabaseTimeZone
Database time zone, used when fetching date, time, or timestamp values
System default time zone
DatabaseVersion
Specific database version, used to fine-tune pushdown support
Automatically detected by the server through the data source JDBC driver, if possible
forbidNonMatchingNames
Only considered when importer.tableNamePattern is set. When set to FALSE, allows creation of tables with names that do not match the tableNamePattern. Tables with non-matching names will only be accessible until the server restarts
TRUE
ForceQuotedIdentifiers
If TRUE, all identifiers in the final query (that is being generated by a translator to be sent directly to the target DBMS) will be quoted
TRUE
MaxPreparedInsertBatchSize
Maximum size of prepared insert batch
2048
OrderByStringConversion
Sets a template to convert ORDER BY clauses on strings
Examples
SQL
OrderByStringConversion=%s
-- no conversion will be applied
OrderByStringConversion=(binary %s)
-- ORDER BY (binary someStringExpression)
OrderByStringConversion=cast((%s) as varchar(2000) ccsid 1208)
-- ORDER BY cast((someStringExpression) as varchar(2000) ccsid 1208)
DB2 for AS/400: cast((%s) as varchar(2000) ccsid 1208)
DB2 for zOS: cast((%s) as varchar(2000) ccsid ASCII)
MS SQL Server: (cast(cast(%s as varchar) as varbinary)
MySQL, SingleStore: (binary %s)
PostgreSQL: (%s COLLATE \"C\")
Others: %s
supportsConvertFromClobToString
If TRUE, indicates that the translator supports the CONVERT/CAST function from clob to string
MySQL, PostgreSQL, Redshift, Greenplum, Oracle, MS SQL Server: TRUE
Others: FALSE
supportsNativeQueries
Forces a translator to issue a native() system procedure that can be used to pass native queries directly to an underlying DBMS
Example
SQL
SELECT x.*
FROM table
(
CALL "dwh.native"("request" => 'SELECT query, pid, elapsed, substring FROM svl_qlog ORDER BY STARTTIME DESC LIMIT 200')
) w
, ARRAYTABLE( w.tuple COLUMNS query string, pid integer , elapsed string, "substring" string ) x;
If TRUE, trims trailing whitespace from fixed-length character strings. Please note that the CData Virtuality Server only has string or varchar type that treats trailing whitespace as meaningful
FALSE
UseBindVariables
if TRUE, indicates that PreparedStatements should be used and that literal values in the source query should be replaced with bind variables. If FALSE, only LOB values will trigger the use of PreparedStatements
TRUE
UseCommentsInSourceQuery
Embeds a / comment / leading comment with session/request id in the source SQL query for informational purposes
FALSE
The names of the translator properties are case-sensitive.
Translator Properties Specific for Google BigQuery
To view the full table, click the expand button in its top right corner
Name
Description
Default value
replaceNullCharsWith
String property. If set, the translator replaces all null characters in strings before executing INSERT / UPDATE with the string specified as the property value. You may specify an empty string as well
CLOUDSTORAGE mode uses a Google Cloud Storage bucket to upload data. The bucket can be configured via bucketName or bucketPrefix translator parameters. If no bucketName is specified, the bucket will be created;
STREAMING (preview) mode streams data via the BigQuery API;
INSERTS (preview) mode uses JDBC prepared statements for inserting data
CLOUDSTORAGE
bucketName
Value: bucket name
Only for the default (CLOUDSTORAGE) upload mode. Specifies a bucket name to upload data files to. If specified, the specified bucket is assumed to exist (no creation/deletion operations performed). The region specified for the bucket must be the same as the region specified via the region translator property. If no bucketName is specified, a bucket will be created and deleted after the operation is finished.
bucketPrefix
Value: bucket prefix
Only for the default (CLOUDSTORAGE) upload mode. Specifies a prefix of a bucket name to upload data files to. The name of the bucket created for uploading will start with the specified prefix. The bucket will be deleted after the operation is finished.
folderInBucket
Value: name of a folder in a bucket
Only for the default (CLOUDSTORAGE) upload mode. Specifies a name of a folder in a bucket to upload data files to. If specified, the specified folder is assumed to exist (no creation/deletion operations performed) within a bukcet.
useDdl
Value: boolean
If TRUE, creates and drops are performed using JDBC statements instead of API
FALSE
setACLforBucket
If set to TRUE, ACL roles are set for the elements of the bucket used by the data source.
If set to FALSE, only IAM permissions are used. Thus, it should be set when uniform bucket-level access is configured for the bucket
TRUE
Data Source Properties
Data Source Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name
Description
Default
importer.autoCorrectColumnNames
Replaces . in a column name with _ as the period character is not supported by the CData Virtuality Server in column names
TRUE
importer.catalog
Database catalogs to use. Can be used if the db connection property is omitted.
Only for Microsoft SQL Server and Snowflake:
Multiple catalogs (multiple databases) usage is supported;
'%' as a wildcard is allowed: for example, importer.catalog="foo%"will import foo, foobar, etc.
To specify several catalog names or/and patterns, values should be comma-separated and enclosed within double quotes: importer.catalog="schema1,schema2,pattern1%,pattern2%";
To use all catalogs, the value should be '%': importer.catalog="%"
To escape a wildcard in a catalog name, use the '\' escape character. The value should be as follows: importer.catalog="test\_tables"
Exasol:
SQL
EXA_DB
All others: empty
importer.defaultCatalog
For data sources using catalog (Microsoft SQL Server and Snowflake).
If importer.catalog property is assigned multiple values importer.defaultCatalog will point to the catalog where tables will be created or dropped in the source.
importer.defaultSchema
Only for data sources that are also supported as Analytical Storage (SQLServer, MySQL, Oracle, PostgreSQL, Redshift);
When the property is correctly set, SELECT INTO, CREATE, and DROP TABLE commands are enabled for that data source;
Must point to the original schema name in the DBMS (e.g. importer.defaultSchema=public);
Needs specifying where tables will be created or dropped in the source DBMS;
Not meant to be used with Analytical Storage data source
Please note that writing into a data source is only possible if this parameter is set.
Empty
importer.enableMetadataCache
Turns on metadata cache for a single data source even when the global option is turned off. Together with importer.skipMetadataLoadOnStartup=true, it allows using materialized views after server restart when the original source is unavailable
FALSE
importer.excludeProcedures
Case-insensitive regular expression that will exclude a matching fully qualified procedure name from import
Empty
importer.excludeSchemas
Comma-separated list of schemas (no % or ? wildcards allowed) to exclude listed schemas from import. A schema specified in defaultSchema or schemaPattern will be imported despite being listed in excludeSchemas. Helps to speed up metadata loading
Case-insensitive regular expression that will exclude a matching fully qualified table name from import. Does not speed up metadata loading. Here are some examples:
1. Excluding all tables in the (source) schemas sys and INFORMATION_SCHEMA:
".*\." will match the schema portion of the table name;
"\b" is a word boundary. It is used to define the boundary between text characters and non-character text;
"\btablename1\b" will only match tablename1 and not tablename11 because of the use of "\b";
"\bcompan" will match "company" and "companies" because we did not use "\b" at the end to delimit a word boundary;
".*" at the end is necessary to match the trailing characters after the name
Empty
importer.fetchSize
Fetch size assigned to a resultset on loading metadata
No default value
importer.importApproximateIndexes
If set to TRUE, imports approximate index information
TRUE
importer.importIndexes
If set to TRUE, imports index/unique key/cardinality information
FALSE
importer.importKeys
If set to TRUE, imports primary and foreign keys
FALSE
importer.importProcedures
If set to TRUE, imports procedures and procedure columns. Overloaded procedures can be imported with an additional option useProcedureSpecificName.
Please note that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for IN and OUT parameters). Such procedures can be excluded from import with the parameter excludeProcedures. Please note that it is not always possible to import procedure result set columns due to database limitations
FALSE
TRUE only for CData connector
importer.loadColumnsTableByTable
Set to TRUE to force table by table metadata processing
FALSE/TRUE only for Netsuite and SAP Advantage Database Server
importer.loadMetadataWithJdbc
If set to TRUE, turns off all custom metadata load ways
Procedure(s) to import. If omitted, all procedures will be imported. % as a wildcard is allowed: for example, importer.procedureNamePattern=foo% will import foo, foobar, etc. W orks only in combination with importProcedures
Empty
importer.quoteNameInSource
If set to FALSE, directs the Data Virtuality Server to create source queries using unquoted identifiers
TRUE
importer.renameDuplicateColumns
If set to TRUE, renames duplicate columns caused by either mixed case collisions or autoCorrectColumnNames replacing . with _. The suffix _n where n is an integer will be added to make the name unique
TRUE
importer.renameDuplicateTables
If set to TRUE, renames duplicate tables caused by mixed case collisions. The suffix _n where n is an integer will be added to make the name unique
TRUE
importer.replaceSpecSymbsInColNames
If set to TRUE, replaces all special symbols (any symbols not in the ^A-Za-z0-9_ sequence) to the _ symbol in column names of tables
FALSE/TRUE only for BigQuery
importer.schemaPattern
Schema(s) to import. If omitted or has "" value, all schemas will be imported. % as wildcard is allowed: for example, importer.schemaPattern=foo% will import foo, foobar, etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed within double quotes: importer.schemaPattern="schema1,schema2,pattern1%,pattern2%". For proper escaping of special characters depending on the type of data source, check Escaping special characters in schema names or use wildcards instead: "[schema_name]" can be rewritten as "%schema%name%". Helps to speed up metadata loading
Empty
importer.skipMetadataLoadOnStartup
If set to TRUE, allows skipping metadata loading on server startup if there's cached metadata for the data source. . Together with importer.enableMetadataCache=true, it allows using materialized views after server restart when the original source is unavailable
FALSE
importer.tableNamePattern
Table(s) to import. If omitted, all tables will be imported. % as a wildcard is allowed: for example, importer.tableNamePattern="foo%,boo%" will import foo, foobar, book, bookshelf, etc
Empty
importer.tableTypes
Comma-separated list (without spaces) of table types to import. Available types depend on the DBMS. Usual format: "TABLE,VIEW".
Other typical types are "SYSTEM TABLE", "GLOBAL TEMPORARY", "LOCAL TEMPORARY", "ALIAS", "SYNONYM"
Empty
importer.useCatalogName
If set to TRUE, uses any non-null/non-empty catalogue name as part of the name in source, e.g. "catalogue"."table"."column", and in the CData Virtuality Server runtime name if useFullSchemaName is TRUE. If set to FALSE, will not use the catalogue name in either the name in source or the CData Virtuality Server runtime name. Should be set to FALSE for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL
TRUE / FALSE only for Hive and EXASOL
importer.useFullSchemaName
If set to FALSE, directs the importer to drop the source catalogue/schema from the CData Virtuality Server object name so that the CData Virtuality Server fully qualified name will be in the form of <model name>.<table name>.
Please note that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception
TRUE
importer.useProcedureSpecificName
If set to TRUE, allows the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in the CData Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names
FALSE
importer.widenUnsignedTypes
If set to TRUE, converts unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte
TRUE
The names of the data source properties are case-sensitive.
Creating/Dropping Tables and Inserting Data
In order to create/drop tables and insert data into a BigQeury data source, the importer.defaultSchemadata source property should be set to the name of the target dataset. Here is an example:
The BigQuery partitioning functionality is supported in both API and DDL (useDdl=TRUE) modes.
Partitioning is supported via the OPTIONS clause.
Types of Partitioning
Integer range partitioning
You can partition a table based on ranges of values in a specific INTEGER column.
Time-unit column partitioning
You can partition a table on a DATE or TIMESTAMP column in the table.
For the TIMESTAMP column, the partitions can have either hourly, daily, monthly, or yearly granularity. For DATE columns, the partitions can have daily, monthly, or yearly granularity. Partitions boundaries are based on UTC time.
Ingestion time partitioning
Valid only for DDL mode.
An ingestion-time partitioned table has a pseudocolumn named _PARTITIONTIME. The value of this column is the ingestion time for each row, truncated to the partition boundary (such as hourly or daily). Instead of using _PARTITIONTIME, you can also use _PARTITIONDATE. The _PARTITIONDATE pseudocolumn contains the UTC date corresponding to the value in the _PARTITIONTIME pseudocolumn.
You can choose hourly, daily, monthly, or yearly granularity for the partitions.
Partitioning Options
partition_expiration_days
When you create a table partitioned by ingestion time or time-unit column, you can specify a partition expiration. This setting specifies how long BigQuery keeps the data in each partition. The setting applies to all partitions in the table, but is calculated independently for each partition based on the partition time.
Accepts FLOAT values.
require_partition_filter
Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. The default value is FALSE.
partition_expression
Partition_expression is an expression that determines how to partition the table. The partition expression can contain the following values:
partition_expression
description
partitioning type
valid for mode
_PARTITIONDATE
Partition by ingestion time with daily partitions
Ingestion time
DDL
DATE(_PARTITIONTIME)
Equivalent to _PARTITIONDATE
Ingestion time
DDL
<date_column>
Partition by the DATE column with daily partitions
Time-unit column
API/DDL
DATE( <timestamp_column> )
Partition by the TIMESTAMP column with daily partitions
Time-unit column
API/DDL
DATETIME_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR })
Partition by the TIMESTAMP column with the specified partitioning type
Time-unit column
API/DDL
TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR })
Partition by the TIMESTAMP column with the specified partitioning type
Time-unit column
DDL
TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR })
Partition by ingestion time with the specified partitioning type
Ingestion time
DDL
DATE_TRUNC(<date_column>, { MONTH | YEAR })
Partition by the DATE column with the specified partitioning type
Partition by an integer column with the specified range, where:
start is the start of range partitioning, inclusive.
end is the end of range partitioning, exclusive.
interval is the width of each range within the partition. Default: 1
Integer range
API/DDL
Examples
Partitioning by the DATE column with daily granularity, keeping data for 3 days in partition and required filter that filters on the partitioning column for querying from table which means that the select query should contains where clause with DATE column:
SQL
CREATE TABLE bigquery.partition1 (i INTEGER, d DATE, s STRING) OPTIONS (partition_by 'd',partition_expiration_days '3', require_partition_filter 'true') ;;
Creating a table partitioned on a timestamp range with monthly granularity:
SQL
CREATE TABLE bigquery_ddl.partition_ddl_date (i integer, d timestamp, s string) OPTIONS (partition_by 'DATE_TRUNC(d, MONTH)') ;;
Creating a table partitioned on integer range required filter that filters on the partitioning column for querying from table which means that the SELECT query should contains where clause with the INTEGER column:
SQL
CREATE TABLE bigquery_ddl.partition_ddl_integer (i integer, d timestamp, s string) OPTIONS (partition_by 'RANGE_BUCKET(i, GENERATE_ARRAY(0, 100, 10))', require_partition_filter 'true') ;;
Partitioning options and integer-range, date-unit partitioning for API mode are available since v4.10
Clustering
Clustering is supported via the OPTIONS clause:
SQL
CREATE TABLE bigquery.cluster (i integer, d date, ts timestamp, s string, b boolean) OPTIONS (cluster_by 'i,d,ts,s') ;;
Partitioning and Clustering for Materialization of Recommended Optimizations
Materialized tables in BigQuery can be partitioned and clustered using the SYSADMIN.createCreationParam procedure.
Create a data source using the CLIENT_ID and CLIENT_SECRET obtained in step 3, AUTH_CODE obtained in step 5, and PROJECT_ID of your BigQuery project. The redirectUrl should be the same as in steps 3 and 5: