Example: Using an Identity Expression
1. Initial Situation:
source.table before Materialization
id
|
orderDate
|
description
|
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3 |
2. Running Optimization Job (full copy):
source.table is at first fully copied to the dwh.table:
id
|
orderDate
|
description
|
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2011/01/01 | descr_2 |
3 | 2012/01/01 | descr_3 |
3. Source table grows up and some rows are updated (highlighted with a blue color)
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2013/02/01 | descr_2_changed |
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
The value of the orderDate for the row with id = 2 is updated and the new value matches now the checkExpressionValue.
4. Running Incremental Optimization Job (newRowCheckExpression = "orderDate", identityExpression = "'id'", deleteOldDate= "true")
4.1 Finding data that is already in the Materialized Table
To estimate the last row that is already in the dwh.table the incremental job will first issue
SELECT MAX(orderDate) as checkExprValue FROM dwh.table
and retrieve
checkExprValue = '2012/01/01'
4.2 Create and fill the DELTA table
The DELTA is calculated as follows:
SELECT * INTO dwh.delta FROM source.table WHERE orderDate >= '2012/01/01'
dwh.delta
id | orderDate | description |
---|---|---|
2 | 2013/02/01 | descr_2_changed |
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
4.3.b Delete rows to update from the original Materialized Table (using the identityExpression value)
DELETE FROM dwh.table WHERE id IN (SELECT id FROM dwh.delta)
After that, the Materialized Table will contain the following rows:
dwh.table
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
Please, note that the identity expression is provided as a SQL code and it will be injected in the DELETE statement as follows:
DELETE FROM dwh.table WHERE <identityExpression> IN (SELECT <identityExpression> FROM dwh.delta)
If the identity expression should contain more than one column, it can be done by concatenating them to a string
CONCAT ( CAST ( id as string ), STRINGCOL1 )
Please pay attention to substituting null values and data types which cannot be implicitly cast to string have to be cast explicitly: xml, clob. Data types which cannot be cast to a string cannot be used in identity expression: blob, object.
4.4 Add DELTA to the original Materialized Table
will be calculated like
INSERT INTO dwh.table SELECT * FROM dwh.delta
and result in
dwh.table
id | orderDate | description |
---|---|---|
1 | 2010/01/01 | descr_1 |
2 | 2013/02/01 | descr_2_changed |
3 | 2012/01/01 | descr_3_changed |
3.1 | 2012/01/01 | descr_3.1 |
4 | 2012/06/01 | descr_4 |
5 | 2013/02/01 | descr_5 |
It is important to note that using the identityExpression prevented the generation of duplicates. If the identityExpression were NULL, two rows with id = 2 would have been copied into the materialized table.