Skip to main content
Skip table of contents

Replication & Materialization

You are looking at an older version of the documentation. The latest version is found here.

The Data Virtuality Server supports several types of replication covering a wide range of use cases. In this section, we describe the algorithmic concept behind each type and show how they can be used.

Materialization

Materialization is a 1:1 copy of the data originating somewhere else: for example, this can be a copy of a table in a data source, a copy of a view content, and a copy of some join or aggregation result. Materializations are kept in tables in the analytical storage, but these tables are not meant to be used directly. Rather, they are used when special rules, called recommended optimizations, tell the Data Virtuality Server's query engine to use the data in the calculation, not the data from the source. The idea of materialization is to let the user concentrate completely on the logic of the data and not on how or where the data is being stored– this is being taken care of by the Data Virtuality Server.

Materialization can be complete or incremental – the difference is explained in more detail on dedicated subpages.

Replication

Replication is usually employed when the data to copy has no 1:1 correspondence anywhere in the data sources or in the logical layer, but rather is being created or modified by an automated process or manually. The Data Virtuality Server includes several such automated processes: Slowly Changing Dimension Type 2 (History Update), Upsert, BatchUpdate etc. You can also create a custom SQL process to replicate data.

There are several types of replication, each described in more detail on its own subpage.

Choosing Between Materialization and Replication

In general, the best practice is to use materialization as default and to use replication only when you need to have more direct control and manipulation of data.

The rule of thumb for choosing materialization vs replication is the following:

  • If by storing the data we do not change the logic of the data, but rather create a 1:1 copy of some existing logic purely for performance reasons, then it is best to use materialization. Usually, materialization can be created, dropped, or completely reloaded at any time without changing the logic of the data.

  • If by storing data we change the logic of the data, so that the data can not be easily reloaded at any time, then replication is the right tool for you.

One important thing to keep in mind is that all types except complete materialization can yield duplicate data when settings are not correctly configured or identity fields have not been chosen correctly according to the source schema. Complete replication only produces duplicates when they are already present in the source.

Comparison of Replication Types Based on Operations on Source Data

To view the full table, click the expand button in its top right corner


TypeProcessDescription
Materialization
Complete

INSERT
UPDATE
DELETE

Row added/updated/deleted in materialized table

Incremental

INSERT

Row added to a materialized table if its Row check field fulfils the WHERE requirement (Subject to Delete old data setting)

Incremental

UPDATE

Updated Row is inserted into a materialized table if its Row check field fulfils the WHERE requirement (Subject to Delete old data setting). If no identity requirement is set, the existing row will remain in the materialized table

Incremental

DELETE

Row remains in the materialized table

Replication

Batch

INSERT

Row added to a materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication

Batch

UPDATE

Updated row is inserted into a materialized table. If no identity requirement is set, the existing row will remain in the materialized table and additional duplicates of existing rows may come with the replication

Batch

DELETE

Row remains in the materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication

History Update

INSERT

Row added to a materialized table

History Update

UPDATE

Row added to a materialized table and existing row gets an update on totimestamp. Only performed, when the update happened on one of the fields selected as Columns to check

History Update

DELETE

Existing row gets an update on totimestamp

Copy Over

INSERT
UPDATE
DELETE

Row added/updated/deleted in materialized table

Upsert Update

INSERT

Row added in a materialized table if specified via keyColumnsArray and updateColumns, otherwise no action

Upsert Update

UPDATE

Row updated in a materialized table if specified via keyColumnsArray and updateColumns, otherwise no action

Upsert Update

DELETE

No action

Comparison of Replication Types Based on Transparency and Flexibility

To view the full table, click the expand button in its top right corner


TypeDescriptionValue
Fallback
Materialization

Complete

Data Virtuality Server manages materialized table with different stages. The server decides when to use which stage

Cannot be set

Not by default

Automatic fallback to an earlier stage

Incremental

Data Virtuality Server manages materialized tables. Stages are only possible when complete replications are used as well

Can be set to one field or one expression

Inserts can be traced. Updates can only be traced when no identity requirement is used

Automatic fallback to an earlier stage

Replication

Batch

One fixed table that requires an identifier chosen by the user. No stages

Can be set to one field or one expression

Operations can only be traced without reliable knowledge of the order in which they happened

No fallback, analytical storage
backup required

History Update

One fixed table that requires an identifier chosen by the user. No stages

Can be set to one or more fields

All operations can be traced with the fromtimestamp and totimestamp columns

No fallback, analytical storage
backup required

Copy Over

One fixed table

Cannot be set

Not by default

No fallback, analytical storage
backup required

Upsert Update

One fixed table

Can be set to one or more fields

Not by default

No fallback, analytical storage
backup required

See Also

Replicating to non-DWH data sources (for v2.4.x)

Add a Column to a View That Uses Batch Job or Slowly Changing Dimensions (Simple Version) 

Add a Column to a View that uses Batch Job or Slowly Changing Dimensions (advanced version)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.