Replication & Materialization
The CData Virtuality Server supports several types of replication covering a wide range of use cases. In this section, we describe the algorithmic concept behind each type and show how they can be used.
Materialization
Materialization is a 1:1 copy of the data originating somewhere else: for example, this can be a copy of a table in a data source, a copy of a view content, and a copy of some join or aggregation result. Materializations are kept in tables in the analytical storage, but these tables are not meant to be used directly. Rather, they are used when special rules, called recommended optimizations, tell the CData Virtuality Server's query engine to use the data in the calculation, not the data from the source. The idea of materialization is to let the user concentrate completely on the logic of the data and not on how or where the data is being stored– this is being taken care of by the CData Virtuality Server.
Materialization can be complete or incremental – the difference is explained in more detail on dedicated subpages.
Replication
Replication is usually employed when the data to copy has no 1:1 correspondence anywhere in the data sources or in the logical layer, but rather is being created or modified by an automated process or manually. The CData Virtuality Server includes several such automated processes: Slowly Changing Dimension Type 2 (History Update), Upsert, BatchUpdate etc. You can also create a custom SQL process to replicate data.
There are several types of replication, each described in more detail on its own subpage.
Choosing Between Materialization and Replication
In general, the best practice is to use materialization as default and to use replication only when you need to have more direct control and manipulation of data.
The rule of thumb for choosing materialization vs replication is the following:
If by storing the data we do not change the logic of the data, but rather create a 1:1 copy of some existing logic purely for performance reasons, then it is best to use materialization. Usually, materialization can be created, dropped, or completely reloaded at any time without changing the logic of the data.
If by storing data we change the logic of the data, so that the data can not be easily reloaded at any time, then replication is the right tool for you.
One important thing to keep in mind is that all types except complete materialization can yield duplicate data when settings are not correctly configured or identity fields have not been chosen correctly according to the source schema. Complete replication only produces duplicates when they are already present in the source.
Comparison of Replication Types Based on Operations on Source Data
To view the full table, click the expand button in its top right corner
Type | Process | Description |
---|---|---|
Materialization | ||
Complete |
| Row added/updated/deleted in materialized table |
Incremental |
| Row added to a materialized table if its Row check field fulfils the |
Incremental |
| Updated Row is inserted into a materialized table if its Row check field fulfils the |
Incremental |
| Row remains in the materialized table |
Replication | ||
Batch |
| Row added to a materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication |
Batch |
| Updated row is inserted into a materialized table. If no identity requirement is set, the existing row will remain in the materialized table and additional duplicates of existing rows may come with the replication |
Batch |
| Row remains in the materialized table. If no identity requirement is set, additional duplicates of existing rows may come with the replication |
History Update |
| Row added to a materialized table |
History Update |
| Row added to a materialized table and existing row gets an update on totimestamp. Only performed, when the update happened on one of the fields selected as Columns to check |
History Update |
| Existing row gets an update on totimestamp |
Copy Over |
| Row added/updated/deleted in materialized table |
Upsert Update |
| Row added in a materialized table if specified via |
Upsert Update |
| Row updated in a materialized table if specified via |
Upsert Update |
| No action |
Comparison of Replication Types Based on Transparency and Flexibility
To view the full table, click the expand button in its top right corner
Type | Description | Value | Fallback | |
---|---|---|---|---|
Materialization | ||||
Complete | CData Virtuality Server manages materialized table with different stages. The server decides when to use which stage | Cannot be set | Not by default | Automatic fallback to an earlier stage |
Incremental | CData Virtuality Server manages materialized tables. Stages are only possible when complete replications are used as well | Can be set to one field or one expression | Inserts can be traced. Updates can only be traced when no identity requirement is used | Automatic fallback to an earlier stage |
Replication | ||||
Batch | One fixed table that requires an identifier chosen by the user. No stages | Can be set to one field or one expression | Operations can only be traced without reliable knowledge of the order in which they happened | No fallback, analytical storage |
History Update | One fixed table that requires an identifier chosen by the user. No stages | Can be set to one or more fields | All operations can be traced with the fromtimestamp and totimestamp columns | No fallback, analytical storage |
Copy Over | One fixed table | Cannot be set | Not by default | No fallback, analytical storage |
Upsert Update | One fixed table | Can be set to one or more fields | Not by default | No fallback, analytical storage |
See Also
Replicating to non-DWH data sources (for v2.4.x)
Add a Column to a View That Uses Batch Job or Slowly Changing Dimensions (Simple Version)
Add a Column to a View that uses Batch Job or Slowly Changing Dimensions (advanced version)