Built-in Advanced Replication Jobs

Pipes supports several types of built-in replication jobs:

Full/CopyOver
Incremental
Batch Update (Pipes Pro feature)
History Update (Pipes Pro feature)
Upsert Update (Pipes Pro feature)

CopyOver

This job retrieves the current data from a given source and places the data into the local table.

If a table with the same name already exists from previous runs, it will be handled according to the selected cleanup method:

drop: table will be dropped and recreated;
delete: table will be emptied via deletion of all rows;
truncate: rows will be truncated.

Incremental

With this type of job, only new rows are retrieved from the source and added to the target. The tricky thing about incremental materialization is that it requires a column with a reliable timestamp value that can easily determine which rows are new and which are not. For example, this can be a Modified column displaying exactly when a row is changed (newly inserted or updated) or a column providing an autoincremented ID.

Batch Update

This is a Pipes Pro feature

Batch replication is a type of replication which is easier to implement than incremental replication (but less flexible) and more powerful than complete replication. It also differs from both in these ways:

Old rows can be kept even if deleted from the source;
Updates can be performed on current rows, and inserted rows will be added;
No timestamp column is required to get new rows because all rows will be retrieved from the source;
You can decide whether or not to allow duplicates in case of updates on rows;
You can also decide how to name the analytical storage table, and there will be no different stages to distinguish, just one table containing the data.

The main concept of batch replication is one fixed table residing in the analytical storage, updated whenever the replication job runs. Unlike complete replication, the user provides the name for this table, and there are no stages. This offers transparency because there is always one stage of the table, but it comes with a risk in case of accidental data loss (by dropping, for example) once the table is lost. The table is created and filled with data at the first run.

History Update

This is a Pipes Pro feature

In some cases, we may need to keep track of changes made to data. Complete, incremental, or batch replication will always yield a table which contains all available data; incremental or batch updates may allow keeping 'old' data which is not present in the data source anymore. But generally, changes in data cannot be tracked, and the user is always forced to work with the most current information.

If you need to know when and how the data has been changed, history update is the right replication type to use. There are several ways to implement this type of replication, and on this page, we describe the type used in the Data Virtuality Server.

It is based on performing a kind of versioning on the data records by adding two columns to the analytical storage table where the locally optimized data is stored. These columns, called fromtimestamp and totimestamp, provide information on the period when the given record was the valid version. To use this feature, you need to do the following:

Select column of the source object (either table or view) which will be considered the key columns. Please remember that the fields should be chosen so that duplicates (entire rows with the same values in all fields) cannot occur.
Select columns to be tracked for updates when the replication job runs.

Upsert Update

This is a Pipes Pro feature

In this case, the job retrieves the current data from a given source and places the data into the local table. The decision whether INSERT or UPDATE will be used is based on a set of key columns (keyColumnsArray) that has to be provided. For each of the source rows, the following operation is done:

If there is no row with the same values in the key columns, then an INSERT will be done, and the new row will be added to the analytical storage table;
Otherwise, the existing row in the analytical storage table will be updated based on the updateColumns and invertUpdateColumns parameters in the following way
- invertUpdateColumns=FALSE: all the columns from updateColumns will be updated.
- invertUpdateColumns=TRUE: all columns except the ones in updateColumns will be updated.

Please note that columns from keyColumnsArray cannot be enumerated in columnsToCheckArray and they will never be updated, because this would change the row to represent a completely new entity.

CopyOver

Incremental

Batch Update

History Update

Upsert Update

See Also