Amazon Redshift

You are looking at an older version of the documentation. The latest version is found here.

Amazon Redshift is a data warehouse that allows its users to analyze data in conjunction with existing Business Intelligence tools and standard SQL. With Amazon Redshift users are enabled to perform complex analysis queries on multiple petabytes of structured data.

Type Name

redshift

Connection Properties

Template name: redshift

Appropriate translator name: redshift

Properties:

host (default: localhost)
port (default: 5439)
db
user-name
password (default: empty)
driver (default: redshift)
driver-class (default: com.amazon.redshift.jdbc.Driver)
ssl (TRUE|FALSE; default: FALSE)
jdbc-properties (arbitrary extra properties)
new-connection-sql
check-valid-connection-sql (default: select 0)
min-pool-size (default: 2)
max-pool-size (default: 70)

Here is an example:

SQL

CALL SYSADMIN.createConnection(name => 'redshift', jbossCLITemplateName => 'redshift', connectionOrResourceAdapterProperties => 'host=<host>,port=5439,db=<database>,user-name=<user_name>,password=<password>') ;;
CALL SYSADMIN.createDataSource(name => 'redshift', translator => 'redshift', modelProperties => 'importer.schemaPattern=test_nk,importer.useFullSchemaName=FALSE,importer.tableTypes="TABLE,VIEW"', translatorProperties => 'supportsNativeQueries=true') ;;

Translator Properies

Translator Properties Shared by All JDBC Connectors

(Properties listed in alphabetical order)

To view the full table, click the expand button in its top right corner

Name	Description	Default value
`comparisonStringConversion`	Sets a template to convert `WHERE` clauses on strings Examples SQL `comparisonStringConversion=%s -- no conversion will be applied comparisonStringConversion=binary %s -- WHERE binary someStringExpression LIKE someOtherStringExpression comparisonStringConversion=(%s COLLATE Latin1_General_CS_AS) -- WHERE (someStringExpression COLLATE Latin1_General_CS_AS LIKE someOtherStringExpression)`	MS SQL Server: `(%s COLLATE Latin1_General_CS_AS)` MySQL, SingleStore: `binary %s` Others: `%s`
`DatabaseTimeZone`	Database time zone, used when fetching date, time, or timestamp values	System default time zone
`DatabaseVersion`	Specific database version, used to fine-tune pushdown support	Automatically detected by the server through the data source JDBC driver, if possible
`forbidNonMatchingNames`	Only considered when `importer.tableNamePattern` is set. When set to `FALSE`, allows creation of tables with names that do not match the `tableNamePattern`. Tables with non-matching names will only be accessible until the server restarts	`TRUE`
`ForceQuotedIdentifiers`	If `TRUE`, all identifiers in the final query (that is being generated by a translator to be sent directly to the target DBMS) will be quoted	`TRUE`
`MaxPreparedInsertBatchSize`	Maximum size of prepared insert batch	`2048`
`OrderByStringConversion`	Sets a template to convert `ORDER BY` clauses on strings Examples SQL `OrderByStringConversion=%s -- no conversion will be applied OrderByStringConversion=(binary %s) -- ORDER BY (binary someStringExpression) OrderByStringConversion=cast((%s) as varchar(2000) ccsid 1208) -- ORDER BY cast((someStringExpression) as varchar(2000) ccsid 1208)`	DB2 for AS/400: `cast((%s) as varchar(2000) ccsid 1208)` DB2 for zOS: `cast((%s) as varchar(2000) ccsid ASCII)` MS SQL Server: `(cast(cast(%s as varchar) as varbinary)` MySQL, SingleStore: (`binary %s)` PostgreSQL: `(%s COLLATE \"C\")` Others: `%s`
`supportsConvertFromClobToString`	If `TRUE`, indicates that the translator supports the `CONVERT`/`CAST` function from `clob` to `string`	MySQL, PostgreSQL, Redshift, Greenplum, Oracle, MS SQL Server: `TRUE` Others: `FALSE`
`supportsNativeQueries`	Forces a translator to issue a `native()` system procedure that can be used to pass native queries directly to an underlying DBMS Example SQL `SELECT x.* FROM table ( CALL "dwh.native"("request" => 'SELECT query, pid, elapsed, substring FROM svl_qlog ORDER BY STARTTIME DESC LIMIT 200') ) w , ARRAYTABLE( w.tuple COLUMNS query string, pid integer , elapsed string, "substring" string ) x;` More usage examples	MySQL, PostgreSQL, Redshift, Snowflake, Oracle, MS SQL Server; Exasol, Vertica, Bigquery, SingleStore, Azure: `TRUE` Others: `FALSE`
`supportsOrderByAlias`	If `FALSE`, the Data Virtuality Server does not generate aliases in the `ORDER BY` clause. If `TRUE`, indicates that the source supports aliases in the `ORDER BY` clause	PostgreSQL: `TRUE` Others: if `OrderByStringConversion` is not set, same as `supportsOrderBy` if `OrderByStringConversion` is set, default for `supportsOrderByAlias` is `FALSE`
`supportsOrderByString`	If `FALSE`, the Data Virtuality Server does not push down the `ORDER BY` clause if it contains at least one expression of `string` type. If `TRUE`, indicates that the source supports strings in the `ORDER BY` clause	MS SQL Server: `FALSE` Others: same as `supportsOrderBy`
`TrimStrings`	If `TRUE`, trims trailing whitespace from fixed-length character strings. Please note that the Data Virtuality Server only has `string` or `varchar` type that treats trailing whitespace as meaningful	`FALSE`
`UseBindVariables`	if `TRUE`, indicates that `PreparedStatements` should be used and that literal values in the source query should be replaced with bind variables. If `FALSE`, only `LOB` values will trigger the use of `PreparedStatements`	`TRUE`
`UseCommentsInSourceQuery`	Embeds a / comment / leading comment with session/request id in the source SQL query for informational purposes	`FALSE`

The names of the translator properties are case-sensitive.

Translator Properties Specific for Amazon Redshift

To view the full table, click the expand button in its top right corner

Name	Description	Default value
`replaceNullCharsWith`	String property. If set, the translator replaces all null characters in strings before executing `INSERT` / `UPDATE` with the string specified as the property value. You may specify an empty string as well	Single space
`uploadMode`	Values: `bulkInsertSingleThread`, `bulkInsertMultiThread` `bulkInsertSingleThread`: Bulk inserts in a single thread `bulkInsertMultiThread`: Bulk inserts in multiple threads The legacy value `insertMultiThread` still works for backward compatibility
`uploadMode`	Value: `s3Load` Uploads data as files on Amazon S3 storage. Translator properties `region`, `keyId`, and `secretKey` should be specified for the mode to work properly. The value is case-insensitive, so `s3Load` and `S3LOAD` are the same values
`maxChunkSize`	Sets the size limit of a temp file in bytes	`16000`
`numberOfThreads`	Specifies the maximum number of uploader threads	`10`
`tempFolder`	Value: a path to a folder (relative or absolute). Specifies the folder to be used for creating temporary files instead of the system configured one
`bucketName`	Value: a bucket name. Only for the `s3Load` `uploadMode`. Specifies a bucket name to upload data files to. If specified, the specified bucket is used and assumed to exist (no creation/deletion operations performed). The region specified for the bucket must be the same as the region specified via the `region` translator property. If no `bucketName` is specified, a bucket will be created and deleted after the operation is finished
`bucketPrefix`	Value: a bucket prefix. Only for the `s3Load` `uploadMode`. Specifies a prefix of a bucket name to upload data files to. The name of the bucket created for uploading will start with the specified prefix. The bucket will be deleted after the operation is finished
`createBucket`	Value: boolean. Specifies if the bucket set by the `bucketName` property should be created if it does not exist	`FALSE`
`region`	Value: a region. Only for the `s3Load` `uploadMode`. Specifies a region endpoint
`keyId, secretKey`	Only for the `s3Load` `uploadMode`. You can read and set the values for keyId and secretKey via the AWS console -> S3 -> username in the upper right corner -> Security Credentials -> Access Keys (Access Key ID and Secret Access Key)
`iamRole, awsAccountId`	Only for the `s3Load` `uploadMode`. Alternate (to `keyId` and `secretKey`) way to perform S3 authorization
`EncryptDataOnS3`	Only for the `s3Load` `uploadMode`. Indicates whether the uploaded file should be encrypted	`FALSE`
`useDoubleSlashToEscapeRegex`	Used to change the default escaping behaviour in `LIKE_REGEX` expressions	`TRUE`
`uploadZipped`	Enables compression of temporary files before uploading them to S3. May be disabled for environments with high network bandwidth in order to save some CPU power but this will increase the disk usage	`FALSE`
`truncateStrings`	If `FALSE`, an exception is thrown if a string/clob value exceeds storage size. Otherwise, if set to `TRUE`, values that exceed storage size will be truncated	`FALSE`
`varcharReserveAdditionalSpacePercent`	As Redshift and Vertica measure varchar size in bytes, not chars, and stores strings in UTF-8 encoding, a char may be 1 to 4 bytes long. You can specify the percentage by which the byte size will exceed the original char size. For instance, if in an original DBMS there is a varchar(32) column (32 chars), the actual data of 32 chars long may occupy 32 bytes for English alphabet or 128 bytes for Chinese characters. If the user knows that the data contain English chars only, he may not specify it at all and the resulting column in the target will be varchar(32) (bytes this time). If the user uses the Chinese language for the column, he should specify a value of 300 for the property and it means that it is necessary to add 300 percent to the original field length (32 + 300 % = 128). Also, there's a special value: 65535 which makes every varchar to be of 65535 bytes long	`0`
`acceptInvChars`	Value: any ASCII character except `NULL` Only for the `s3Load` `uploadMode`. This property enables the loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters. If it's specified, `ACCEPTINVCHARS` option will be passed to the S3 COPY command and each invalid UTF-8 character will be replaced with a string of equal length consisting of the character specified. For example, if the replacement character is '`^`', an invalid three-bytruncateStringste character will be replaced with '`^^^`'. If acceptInvChars is not specified, an error will be thrown whenever an invalid UTF-8 character is encountered
`copyParams`	Arbitrary parameters to be passed to the COPY command when uploading data from S3	`NULL`
`keepTempFiles`	Keep temporary files after uploading	`FALSE` - can be enabled for debugging
`maxTableNameLength`	Maximum length of a table name	`127`
`maxColumnNameLength`	Maximum length of a column name . Five chars of defined maximum length will be reserved for internal purposes and cannot be used for column-identifier	`127`

Here is an example:

SQL

CALL SYSADMIN.createConnection(name => 'redshift', jbossCLITemplateName => 'redshift', connectionOrResourceAdapterProperties => 'host=<host>,port=5439,db=<database>,user-name=<user_name>,password=<password>') ;;
CALL SYSADMIN.createDataSource(name => 'redshift', translator => 'redshift', modelProperties => 'importer.schemaPattern=test_nk,importer.useFullSchemaName=FALSE,importer.tableTypes="TABLE,VIEW"', translatorProperties => 'varcharReserveAdditionalSpacePercent=300,supportsNativeQueries=true,uploadMode=s3Load,region=<region>,bucketName=<bucket_name>,createBucket=true,keyId=<key_ID>,secretKey="<secret_key>"') ;;

Translator Properties for Amazon Redshift as Analytical Storage

If Amazon Redshift is used as analytical storage, we recommend loading data using Amazon AWS S3 (S3LOAD), as inserting data into Redshift using standard JDBC protocol can be very slow.

The following translator properties are required to configure S3LOAD:

Parameter	Description
`uploadMode=s3load`	Explicitly specifies S3LOAD mode
`region`	AWS S3 region endpoint
`bucketName`	Bucket name to upload data files t; optional
`bucketPrefix`	Prefix of the temporary bucket to upload data files to if `bucketName` is not specified; must comply with Amazon S3 bucket naming convention (nb: 36 characters would be added to the bucket prefix when creating a temporary bucket); optional
`createBucket`	Specifies if the bucket set in the `bucketName` parameter should be created if it does not exist; optional; default: `FALSE`
`keyId`	AWS S3 key ID
`secretKey`	AWS S3 secret key

Here is an example:

SQL

CALL SYSADMIN.createConnection(name => 'redshift', jbossCLITemplateName => 'redshift', connectionOrResourceAdapterProperties => 'host=<host>,port=5439,db=<database>,user-name=<user_name>,password=<password>') ;;
CALL SYSADMIN.createDataSource(name => 'redshift', translator => 'redshift', modelProperties => 'importer.schemaPattern=test_nk,importer.useFullSchemaName=FALSE,importer.tableTypes="TABLE,VIEW"', translatorProperties => 'varcharReserveAdditionalSpacePercent=300,supportsNativeQueries=true,uploadMode=s3Load,region=<region>,bucketName=<bucket_name>,createBucket=true,keyId=<key_ID>,secretKey="<secret_key>"') ;;

Data Source Properties

Data Source Properties Shared by All JDBC Connectors

(Properties listed in alphabetical order)

To view the full table, click the expand button in its top right corner

Name	Description	Default
`importer.autoCorrectColumnNames`	Replaces `.` in a column name with `_` as the period character is not supported by the Data Virtuality Server in column names	`TRUE`
`importer.catalog`	Database catalogs to use. Can be used if the `db` connection property is omitted. Only for Microsoft SQL Server and Snowflake: Multiple catalogs (multiple databases) usage is supported; `'%'` as a wildcard is allowed: for example, `importer.catalog="foo%"` will import `foo`, `foobar`, etc. To specify several catalog names or/and patterns, values should be comma-separated and enclosed within double quotes: `importer.catalog="schema1,schema2,pattern1%,pattern2%"`; To use all catalogs, the value should be '%': `importer.catalog="%"` To escape a wildcard in a catalog name, use the '\' escape character. The value should be as follows: `importer.catalog="test\_tables"`	Empty
`importer.defaultSchema`	Only for data sources that are also supported as Analytical Storage (SQLServer, MySQL, Oracle, PostgreSQL, Redshift); When the property is correctly set, `SELECT INTO`, `CREATE`, and `DROP TABLE` commands are enabled for that data source; Must point to the original schema name in the DBMS (e.g. `importer.defaultSchema=public`); Needs specifying where tables will be created or dropped in the source DBMS; Not meant to be used with Analytical Storage data source Please note that writing into a data source is only possible if this parameter is set.	Empty
`importer.enableMetadataCache`	Turns on metadata cache for a single data source even when the global option is turned off. Together with `importer.skipMetadataLoadOnStartup=true`, it allows using materialized views after server restart when the original source is unavailable	`FALSE`
`importer.excludeProcedures`	Case-insensitive regular expression that will exclude a matching fully qualified procedure name from import	Empty
`importer.excludeSchemas`	Comma-separated list of schemas (no `%` or `?` wildcards allowed) to exclude listed schemas from import. A schema specified in `defaultSchema` or `schemaPattern` will be imported despite being listed in `excludeSchemas`. Helps to speed up metadata loading	Oracle: SQL `APEX_PUBLIC_USER, DIP, FLOWS_040100, FLOWS_020100, FLOWS_FILES, MDDATA, ORACLE_OCM, SPATIAL_CSW_ADMIN_USR, SPATIAL_WFS_ADMIN_USR, XS$NULL, BI, HR, OE, PM, IX, SH, SYS, SYSTEM, MDSYS, CTXSYS` All others: empty
`importer.excludeTables`	Case-insensitive regular expression that will exclude a matching fully qualified table name from import. Does not speed up metadata loading Examples CODE `-- Exclude all tables in (source) schemas sys and INFORMATION_SCHEMA: importer.excludeTables=(.[.]sys[.].\|.[.]INFORMATION_SCHEMA[.].) -- You can also negate the defined patterns for the tables to be excluded. -- Exclude all tables except that ones starting with "public.br" and "public.mk": importer.excludeTables=(?!public\.(br\|mk)).*`	Empty
`importer.fetchSize`	Fetch size assigned to a resultset on loading metadata	No default value
`importer.importApproximateIndexes`	If set to `TRUE`, imports approximate index information	`TRUE`
`importer.importIndexes`	If set to `TRUE`, imports index/unique key/cardinality information	`FALSE`
`importer.importKeys`	If set to `TRUE`, imports primary and foreign keys	`FALSE`
`importer.importProcedures`	If set to `TRUE`, imports procedures and procedure columns. Overloaded procedures can be imported with an additional option `useProcedureSpecificName`. Please note that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for `IN` and `OUT` parameters). Such procedures can be excluded from import with the parameter `excludeProcedures`. Please note that it is not always possible to import procedure result set columns due to database limitations	`FALSE`
`importer.loadColumnsTableByTable`	Set to `TRUE` to force table by table metadata processing	`FALSE`/`TRUE` only for Netsuite and SAP Advantage Database Server
`importer.loadMetadataWithJdbc`	If set to `TRUE`, turns off all custom metadata load ways	`FALSE`
`importer.procedureNamePattern`	Procedure(s) to import. If omitted, all procedures will be imported. `%` as a wildcard is allowed: for example, `importer.procedurePatternName=foo%` will import `foo` , `foobar`, etc. W orks only in combination with `importProcedures`	Empty
`importer.quoteNameInSource`	If set to `FALSE`, directs the Data Virtuality Server to create source queries using unquoted identifiers	`TRUE`
`importer.renameDuplicateColumns`	If set to `TRUE`, renames duplicate columns caused by either mixed case collisions or `autoCorrectColumnNames` replacing `.` with `_`. The suffix `_n` where `n` is an integer will be added to make the name unique	`TRUE`
`importer.renameDuplicateTables`	If set to `TRUE`, renames duplicate tables caused by mixed case collisions. The suffix `_n` where `n` is an integer will be added to make the name unique	`TRUE`
`importer.replaceSpecSymbsInColNames`	If set to `TRUE`, replaces all special symbols (any symbols not in the `^A-Za-z0-9_` sequence) to the `_` symbol in column names of tables	`FALSE` / `TRUE` only for BigQuery
`importer.schemaPattern`	Schema(s) to import. If omitted or has "" value, all schemas will be imported. `%` as wildcard is allowed: for example, `importer.schemaPattern=foo%` will import `foo`, `foobar`, etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed within double quotes: `importer.schemaPattern="schema1,schema2,pattern1%,pattern2%"`. For proper escaping of special characters depending on the type of data source, check Escaping special characters in schema names or use wildcards instead: `"[schema_name]"` can be rewritten as `"%schema%name%"`. Helps to speed up metadata loading	Empty
`importer.skipMetadataLoadOnStartup`	If set to `TRUE`, allows skipping metadata loading on server startup if there's cached metadata for the data source. . Together with `importer.enableMetadataCache=true`, it allows using materialized views after server restart when the original source is unavailable	`FALSE`
`importer.tableNamePattern`	Table(s) to import. If omitted, all tables will be imported. `%` as a wildcard is allowed: for example, `importer.tableNamePattern=foo%` will import `foo`, `foobar`, etc	Empty
`importer.tableTypes`	Comma-separated list (without spaces) of table types to import. Available types depend on the DBMS. Usual format: `"TABLE,VIEW"`. Other typical types are `"SYSTEM TABLE"`, `"GLOBAL TEMPORARY"`, `"LOCAL TEMPORARY"`, `"ALIAS"`, `"SYNONYM"`	Empty
`importer.useCatalogName`	If set to `TRUE`, uses any non-null/non-empty catalogue name as part of the name in source, e.g. `"catalogue"."table"."column"`, and in the Data Virtuality Server runtime name if `useFullSchemaName` is `TRUE`. If set to `FALSE`, will not use the catalogue name in either the name in source or the Data Virtuality Server runtime name. Should be set to `FALSE` for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL	`TRUE` / `FALSE` only for Hive
`importer.useFullSchemaName`	If set to `FALSE`, directs the importer to drop the source catalogue/schema from the Data Virtuality Server object name so that the Data Virtuality Server fully qualified name will be in the form of `<model name>.<table name>`. Please note that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception	`TRUE`
`importer.useProcedureSpecificName`	If set to `TRUE`, allows the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in the Data Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names	`FALSE`
`importer.widenUnsignedTypes`	If set to `TRUE`, converts unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte	`TRUE`

The names of the data source properties are case-sensitive.

Using a materialized view when the original source is not available at the time the server starts is possible since v3.0.8

Multiple catalogs and wildcards in importer.catalog are available since v3.9

Escaping wildcards in importer.catalog available since v3.13

Distribution and Sort Keys

Redshift does not support indexes but supports sort and distribution keys that can be used to improve the performance of queries. With respect to indexes, these keys must be defined when the table is created.

Sort Keys

SORTKEYs are created by analyzing the currently recommended indexes collected for each optimization. They can be specified both at column and table levels. It is possible to specify only one SORTKEY column (at column level) or multiple columns (at table level).

Since it is possible to specify only one SORTKEY (with one or more columns) at the table level, we decided to create a SORTKEY corresponding to the recommended index (with kind SINGLE or MULTIPLE) with the highest frequency. The system will create then a SORTKEY with one column or with multiple columns if the highest frequency index is SINGLE or MULTIPLE, respectively.

Columns that are normally recommended for index creation are used to define sort and distribution keys.

Distribution Keys

DISTKEYs are not automatically recommended by the system and need to be manually created by the user. Here are two things to keep in mind:

It is not possible to specify more than one DISTKEY for each recommended optimization;
IndexType of a DISTKEY must be set to MANUAL (which is the default, so you can skip this step).

Distribution Style

The data distribution style is defined for the whole table. Amazon Redshift distributes the rows of a table to the compute nodes according to the distribution style specified for the table. The distribution style that you select for tables affects the overall performance of your database.

Style	Description
`EVEN`	The data in the table is spread evenly across the nodes in a cluster in a round-robin distribution. Row IDs are used to determine the distribution, and roughly the same number of rows are distributed to each node. This is the default distribution method
`KEY`	The data is distributed by the values in the `DISTKEY` column. When you set the joining columns of joining tables as distribution keys, the joining rows from both tables are collocated on the compute nodes. When data is collocated, the optimizer can perform joins more efficiently. If you specify `DISTSTYLE KEY`, you must name a `DISTKEY` column
`ALL`	A copy of the entire table is distributed to every node. This distribution style ensures that all the rows required for any join are available on every node, but it multiplies storage requirements and increases the load and maintenance times for the table. `ALL` distribution can improve execution time when used with certain dimension tables where `KEY` distribution is not appropriate, but performance improvements must be weighed against maintenance costs

Internal

SORTKEY and DISTKEY created for a table in Redshift can be checked with a query like this (to be executed directly on Redshift):

SQL

SELECT tablename, "column", type, encoding, distkey, sortkey FROM pg_table_def WHERE tablename LIKE 'mat_table%';

Please note that the schema containing the table has to be in the search path.

Usage Examples

SORTKEY, DISTKEY, and DISTSTYLE are passed as OPTIONS. They are added to the CREATE TABLE or SELECT INTO command as shown below:

SQL

CREATE TABLE source.table_name (id integer, name varchar(255))
OPTIONS (DISTKEY 'id', SORTKEY 'id,name') ;;
  
SELECT * INTO target.table_name FROM source.table_name
OPTIONS (SORTKEY 'id', DISTSTYLE 'EVEN') ;;

Redshift Spectrum

The Redshift JDBC driver exposes Spectrum tables as EXTERNAL TABLE. In order to have your Redshift data source list Spectrum tables, adjust the data source parameter importer.tableTypes accordingly by specifying e.g. importer.tableTypes="TABLE,VIEW,EXTERNAL TABLE".

Type Name

Connection Properties

Translator Properies

Translator Properties Specific for Amazon Redshift

Translator Properties for Amazon Redshift as Analytical Storage

Data Source Properties

Distribution and Sort Keys

Sort Keys

Distribution Keys

Distribution Style

Internal

Usage Examples

Redshift Spectrum

See Also