Google BigQuery
Google BigQuery is a fully managed cloud data warehouse for analytics. While Google BigQuery works in conjunction with Google Storage for interactive analysis of massively large data sets it can scan TeraBytes in seconds and PetaBytes in minutes.
Type Name
bigquery
Connection Properties
Template name: bigquery
Appropriate translator name: bigquery
Properties:
projectId
(required)transformQuery
(default:TRUE
); (obsolete)credentialFactory
(default:com.datavirtuality.dv.core.oauth.credential.BigQueryOAuthCredentialFactory
)allowLargeResults (obsolete)
largeResultsDefaultDataset
tableMaxResults
fetchSize
refreshToken
accessToken
expirationTimeMilliseconds
region
authCode
redirectUri
user-name
(required)password
(default: empty)ClientId
ClientSecret
driver
(default:bigquery
)driver-class
(default:com.datavirtuality.jdbc.bq.BQDriver
)storageProjectId
(default: empty)storageUser
(default: empty)storagePassword
(default: empty)new-connection-sql
check-valid-connection-sql
(default:select 1
)min-pool-size
(default:2
)max-pool-size
(default:70
)readTimeout
(default:20000
,0
for an infinite, a negative for the default)
(default:connectTimeout
20000
,0
for an infinite, a negative for the default)
Here is an example:
CALL SYSADMIN.createConnection('bq2','bigquery','projectId=XXXX,user-name=XXXX@developer.gserviceaccount.com,password=$${jboss.server.config.dir}/../deployments/bigquery.p12') ;;
CALL SYSADMIN.createDatasource('bq2','bigquery','importer.useFullSchemaName=false,importer.useCatalogName=false','supportsNativeQueries=true,uploadMode=CLOUDSTORAGE,bucketName=dv_upload_bucket,useDdl=false') ;;
It is highly recommended to specify a bucketName
when creating a BigQuery data source because of the following:
- a bucket creation limit that can be exceeded while performing multiple data inserts. Please refer to the Google Cloudstorage documentation for detailed information;
- region mismatch between a BigQuery data set and a temporary bucket which may cause errors while inserting data. Please, refer to the Google BigQuery documentation for more details.
Translator Properties
Translator Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name | Description | Default value |
---|---|---|
comparisonStringConversion
| Sets a template to convert Examples
SQL
|
|
| Database time zone, used when fetching date, time, or timestamp values | System default time zone |
| Specific database version, used to fine-tune pushdown support | Automatically detected by the server through the data source JDBC driver, if possible |
forbidNonMatchingNames | Only considered when importer.tableNamePattern is set. When set to FALSE , allows creation of tables with names that do not match the tableNamePattern . Tables with non-matching names will only be accessible until the server restarts | TRUE |
ForceQuotedIdentifiers
| If | TRUE |
| Maximum size of prepared insert batch |
|
OrderByStringConversion | Sets a template to convert Examples
SQL
|
|
supportsConvertFromClobToString
| If TRUE , indicates that the translator supports the CONVERT /CAST function from clob to string |
|
| Forces a translator to issue a Example
SQL
|
|
supportsOrderByAlias
| If If |
|
supportsOrderByString
| If If |
|
TrimStrings | If |
|
| if |
|
| Embeds a / comment / leading comment with session/request id in the source SQL query for informational purposes |
|
The names of the translator properties are case-sensitive.
Translator Properties Specific for Google BigQuery
To view the full table, click the expand button in its top right corner
Name | Description | Default value |
---|---|---|
| String property. If set, the translator replaces all null characters in strings before executing INSERT / UPDATE with the string specified as the property value. You may specify an empty string as well | Single space |
| Values:
| CLOUDSTORAGE |
| Value: bucket name Only for the default ( | |
| Value: bucket prefix Only for the default ( | |
| Value: boolean If | FALSE |
| If set to If set to |
|
Data Source Properties
Data Source Properties Shared by All JDBC Connectors
(Properties listed in alphabetical order)
To view the full table, click the expand button in its top right corner
Name | Description | Default |
---|---|---|
importer.autoCorrectColumnNames
| Replaces . in a column name with _ as the period character is not supported by the CData Virtuality Server in column names |
TRUE
|
| Database catalogs to use. Can be used if the Only for Microsoft SQL Server and Snowflake:
| Exasol:
SQL
All others: empty |
importer.defaultSchema
|
Please note that writing into a data source is only possible if this parameter is set. | Empty |
importer.enableMetadataCache | Turns on metadata cache for a single data source even when the global option is turned off. Together with importer.skipMetadataLoadOnStartup=true , it allows using materialized views after server restart when the original source is unavailable |
FALSE
|
importer.excludeProcedures
| Case-insensitive regular expression that will exclude a matching fully qualified procedure name from import | Empty |
importer.excludeSchemas
| Comma-separated list of schemas (no % or ? wildcards allowed) to exclude listed schemas from import. A schema specified in defaultSchema or schemaPattern will be imported despite being listed in excludeSchemas . Helps to speed up metadata loading | Oracle:
SQL
All others: empty |
| Case-insensitive regular expression that will exclude a matching fully qualified table name from import. Does not speed up metadata loading. Here are some examples: 1. Excluding all tables in the (source) schemas
SQL
2. Excluding all tables except the ones starting with "public.br" and "public.mk" using a negative lookahead:
SQL
3. Excluding "tablename11" from the list ["tablename1", "tablename11", "company", "companies"]:
SQL
| Empty |
| Fetch size assigned to a resultset on loading metadata | No default value |
| If set to |
|
importer.importIndexes
| If set to TRUE , imports index/unique key/cardinality information |
FALSE
|
importer.importKeys
| If set to TRUE , imports primary and foreign keys |
FALSE
|
| If set to Please note that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for |
|
importer.loadColumnsTableByTable | Set to TRUE to force table by table metadata processing | FALSE /TRUE only for Netsuite and SAP Advantage Database Server |
importer.loadMetadataWithJdbc
| If set to TRUE , turns off all custom metadata load ways |
FALSE
|
importer.procedureNamePattern
| Procedure(s) to import. If omitted, all procedures will be imported. % as a wildcard is allowed: for example, importer.procedurePatternName=foo% will import foo , foobar , etc. W orks only in combination with importProcedures | Empty |
| If set to |
|
importer.renameDuplicateColumns
| If set to TRUE , renames duplicate columns caused by either mixed case collisions or autoCorrectColumnNames replacing . with _ . The suffix _n where n is an integer will be added to make the name unique |
TRUE
|
importer.renameDuplicateTables
| If set to TRUE , renames duplicate tables caused by mixed case collisions. The suffix _n where n is an integer will be added to make the name unique |
TRUE
|
importer.replaceSpecSymbsInColNames | If set to TRUE , replaces all special symbols (any symbols not in the ^A-Za-z0-9_ sequence) to the _ symbol in column names of tables |
FALSE
/
TRUE
only for BigQuery |
importer.schemaPattern
| Schema(s) to import. If omitted or has "" value, all schemas will be imported. % as wildcard is allowed: for example, importer.schemaPattern=foo% will import foo , foobar , etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed within double quotes:
importer.schemaPattern="schema1,schema2,pattern1%,pattern2%" . For proper escaping of special characters depending on the type of data source, check Escaping special characters in schema names or use wildcards instead: "[schema_name]" can be rewritten as "%schema%name%" . Helps to speed up metadata loading | Empty |
importer.skipMetadataLoadOnStartup
| If set to |
FALSE
|
importer.tableNamePattern
| Table(s) to import. If omitted, all tables will be imported. % as a wildcard is allowed: for example, importer.tableNamePattern=foo% will import foo , foobar , etc | Empty |
importer.tableTypes
| Comma-separated list (without spaces) of table types to import. Available types depend on the DBMS. Usual format: Other typical types are | Empty |
importer.useCatalogName
| If set to TRUE , uses any non-null/non-empty catalogue name as part of the name in source, e.g. "catalogue"."table"."column" , and in the CData Virtuality Server runtime name if useFullSchemaName is TRUE . If set to FALSE , will not use the catalogue name in either the name in source or the CData Virtuality Server runtime name. Should be set to FALSE for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL | TRUE / FALSE only for Hive and EXASOL |
importer.useFullSchemaName
| If set to Please note that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception |
TRUE
|
importer.useProcedureSpecificName
| If set to TRUE , allows the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in the CData Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names |
FALSE
|
importer.widenUnsignedTypes
| If set to TRUE , converts unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte |
TRUE
|
Creating/Dropping Tables and Inserting Data
In order to create/drop tables and insert data into a BigQeury data source, the importer.defaultSchema
data source property should be set to the name of the target dataset. Here is an example:
CALL SYSADMIN.createConnection(name => 'bq', jbossCliTemplateName => 'bigquery', connectionOrResourceAdapterProperties => 'projectId=XXXX,user-name=XXXX-ZZZZ@developer.gserviceaccount.com,password=PATH_TO_KEY_FILE') ;;
CALL SYSADMIN.createDatasource(name => 'bq', translator => 'bigquery', modelProperties => 'importer.useCatalogName=false,importer.defaultSchema=TARGET_DATA_SET,importer.schemaPattern=SCHEMA_PATTERN_INCLUDING_TARGET_DATASET,importer.useFullSchemaName=false') ;;
Segmentation and Optimization
The BigQuery partitioning functionality is supported in both API and DDL modes.
API Mode
In the API mode (when the useDdl
translator property is set to FALSE
), tables can be partitioned on the DATE
or TIMESTAMP
column. On the BigQuery side, such tables will be partitioned with daily granularity:
CREATE TABLE bigquery.partition1 (i INTEGER, d DATE, s STRING) OPTIONS (partition_by 'd') ;;
DDL Mode
In DDL mode (when the useDdl
translator property is set to TRUE
), partitioning can be configured via the BigQuery syntax, including integer-range partitioning:
1.. Creating a table partitioned on a date range:
CREATE TABLE bigquery_ddl.partition_ddl_date (i integer, d timestamp, s string) OPTIONS (partition_by 'DATE(d)') ;;
2. Creating a table partitioned on integer range:
CREATE TABLE bigquery_ddl.partition_ddl_integer (i integer, d timestamp, s string) OPTIONS (partition_by 'RANGE_BUCKET(i, GENERATE_ARRAY(0, 100, 10))') ;;
Clustering
Clustering is supported via the OPTIONS
clause:
CREATE TABLE bigquery.cluster (i integer, d date, ts timestamp, s string, b boolean) OPTIONS (cluster_by 'i,d,ts,s') ;;
Configuring 3-legged OAuth for BigQuery
To set up 3-legged OAuth with BigQuery, follow these steps:
1. Log in to the Developer Console for your BigQuery project.
2. Generate new OAuth2 credentials (a pair of CLIENT_ID
and CLIENT_SECRET
keys).
3. Substitute CLIENT_ID
in the link below with your CLIENT_ID
and follow this link in the browser:
4. Choose 'Allow' to provide access:
5. You will be redirected to a non-existent page, which is correct. Copy the page URL to obtain the code:
The OAuth code is only valid for 10 minutes
|
6. Create a data source using the CLIENT_ID
and CLIENT_SECRET
obtained in step 3, AUTH_CODE
obtained in step 5, and PROJECT_ID
of your BigQuery project. The redirectUrl
should be the same as in steps 3 and 5:
CALL SYSADMIN.createConnection('bq3legged','bigquery','authType=OAUTH,
projectId=PROJECT_ID ,
authCode=AUTH_CODE ,
redirectUri=http%3A%2F%2Flocalhost%3A9000%2Fredirect.html%3Fto%3Dconnect/bigquery/oauth,
user-name=CLIENT_ID,
password=CLIENT_SECRET') ;;
CALL SYSADMIN.createDatasource('bq3legged','bigquery','importer.schemaPattern=bq_schema_pattern,importer.defaultSchema=bq_schema,
importer.useFullSchemaName=false,importer.useCatalogName=false','supportsNativeQueries=true') ;;
See Also
Partitioning Tables in BigQuery to learn how to convert a non-partitioned table into a partitioned one for improved query performance