Example: Check Expressions with Non-unique Results
Let us consider the following example:
There is a source table with the columns id (primary key) and orderDate (used as check expression). In this first example, the identityExpression
is NULL.
1. Initial Situation:
source.table before Materialization
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3 |
2. Running Optimization Job (full copy):
source.table is at first fully copied to the dwh.table:
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3 |
3. Source table grows up and some rows are updated (highlighted with a blue color)
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2_changed |
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
(please, note that rows with ids 3 and 3.1 have the same orderDate)
4. Running Incremental Optimization Job (newRowCheckExpression = "orderDate")
4.1 Finding data that is already in the Materialized Table
To estimate the last row that is already in the dwh.table the incremental job will first issue
SELECT MAX(orderDate) as checkExprValue FROM dwh.table
and retrieve
checkExprValue = '2012/01/01'
4.2 Create and fill the DELTA table
The DELTA is calculated as follows:
4.2.1 deleteOldData = false
SELECT * INTO dwh.delta FROM source.table WHERE orderDate > '2012/01/01'
4.2.2 deleteOldData = true
SELECT * INTO dwh.delta FROM source.table WHERE orderDate >= '2012/01/01'
dwh.delta (deleteOldData = false)
id | orderDate | description |
---|---|---|
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
dwh.delta (deleteOldData = true)
id | orderDate | description |
---|---|---|
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
4.3 Delete rows to update from the original Materialized Table
This step will be only executed if incremental update is run with deleteOldData set to true.
DELETE FROM dwh.table WHERE orderDate >= '2012/01/01'
After that, the Materialized Table will contain the following rows:
dwh.table (deleteOldData = true)
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
4.4 Add DELTA to the original Materialized Table
will be calculated like
INSERT INTO dwh.table SELECT * FROM dwh.delta
and result in
dwh.table (deleteOldData = false)
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
dwh.table (deleteOldData = true)
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
It is important to note how the row with id = 2 was not updated (its description value is "descr_2" instead of "descr_2_changed") since this row does not match the criteria provided by the newRowCheckExpression parameter.