Replication & Materialization

The CData Virtuality Server supports several types of replication covering a wide range of use cases. In this section, we describe the algorithmic concept behind each type and show how they can be used.

Materialization

Materialization is a 1:1 copy of the data originating somewhere else: for example, this can be a copy of a table in a data source, a copy of a view content, and a copy of some join or aggregation result. Materializations are kept in tables in the analytical storage, but these tables are not meant to be used directly. Rather, they are used when special rules, called recommended optimizations, tell the CData Virtuality Server's query engine to use the data in the calculation, not the data from the source. The idea of materialization is to let the user concentrate completely on the logic of the data and not on how or where the data is being stored– this is being taken care of by the CData Virtuality Server.

Materialization can be complete or incremental – the difference is explained in more detail on dedicated subpages.

Replication

Replication is usually employed when the data to copy has no 1:1 correspondence anywhere in the data sources or in the logical layer, but rather is being created or modified by an automated process or manually. The CData Virtuality Server includes several such automated processes: Slowly Changing Dimension Type 2 (History Update), Upsert, BatchUpdate etc. You can also create a custom SQL process to replicate data.

There are several types of replication, each described in more detail on its own subpage.

Choosing Between Materialization and Replication

In general, the best practice is to use materialization as default and to use replication only when you need to have more direct control and manipulation of data.

The rule of thumb for choosing materialization vs replication is the following:

If by storing the data we do not change the logic of the data, but rather create a 1:1 copy of some existing logic purely for performance reasons, then it is best to use materialization. Usually, materialization can be created, dropped, or completely reloaded at any time without changing the logic of the data.
If by storing data we change the logic of the data, so that the data can not be easily reloaded at any time, then replication is the right tool for you.

One important thing to keep in mind is that all types except complete materialization can yield duplicate data when settings are not correctly configured or identity fields have not been chosen correctly according to the source schema. Complete replication only produces duplicates when they are already present in the source.

Comparison of Replication Types Based on Operations on Source Data

To view the full table, click the expand button in its top right corner

Type	Process	Description
Materialization
Complete	`INSERT` `UPDATE` `DELETE`	Row added/updated/deleted in materialized table
Incremental	`INSERT`	Row added to a materialized table if its Row check field fulfils the `WHERE` requirement (Subject to Delete old data setting)
Incremental	`UPDATE`	Updated Row is inserted into a materialized table if its Row check field fulfils the `WHERE` requirement (Subject to Delete old data setting). If no identity requirement is set, the existing row will remain in the materialized table
Incremental	`DELETE`	Row remains in the materialized table
Replication
Batch	`INSERT`	Row added to a materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication
Batch	`UPDATE`	Updated row is inserted into a materialized table. If no identity requirement is set, the existing row will remain in the materialized table and additional duplicates of existing rows may come with the replication
Batch	`DELETE`	Row remains in the materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication
History Update	`INSERT`	Row added to a materialized table
History Update	`UPDATE`	Row added to a materialized table and existing row gets an update on totimestamp. Only performed, when the update happened on one of the fields selected as Columns to check
History Update	`DELETE`	Existing row gets an update on totimestamp
Copy Over	`INSERT` `UPDATE` `DELETE`	Row added/updated/deleted in materialized table
Upsert Update	`INSERT`	Row added in a materialized table if specified via `keyColumnsArray` and `updateColumns`, otherwise no action
Upsert Update	`UPDATE`	Row updated in a materialized table if specified via `keyColumnsArray` and `updateColumns`, otherwise no action
Upsert Update	`DELETE`	No action

Comparison of Replication Types Based on Transparency and Flexibility