Optimization Subsystem

When the CData Virtuality Server processes a query and finds that it can be optimized, it recommends an optimization by creating a new row in the SYSADMIN.RecommendedOptimizations table. If you want to accept the recommended optimization, you can use the enableOptimization stored procedure.

Materialization must be renewed if the structure or column names of the underlying sources (tables and views the optimization refers to) have changed. Therefore, the enableOptimization procedure can be used with the materializeNow parameter set to TRUE.

Queries against the SYS, SYSADMIN, and pg_catalog system schemas will not result in an optimization.

Currently, the CData Virtuality Server supports three different types of optimizations: MAT_TABLE, MAT_JOIN, and MAT_AGGR. They are automatically generated by the system depending on the query executed by the user. Each recommended optimization is identified by a match descriptor representing the source tables/views or the joins the system can materialize to optimize the user query.

Type	Description
`MAT_TABLE`	Recommended optimization related to a single source table or view (e.g. generated by a query like `SELECT * FROM A`)
`MAT_JOIN`	Recommended optimization related to one or more joins between two or more source tables or views (e.g. generated by a query like `SELECT * FROM A JOIN B ON A.a = B.b`)
`MAT_AGGR`	Recommended optimization related to pre-aggregated tables (e.g. generated by a query like `SELECT a, COUNT(b) FROM A GROUP BY a`)

Optimizations of the MAT_TABLE and MAT_JOIN types are uniquely identified by their match descriptors, and, if enabled, they replicate all the columns defined in the source tables or views. Optimizations of the MAT_AGGR type are not uniquely identified by its match descriptors, but they differ on the list of "GroupBy columns" and "Aggregation functions" used in the original user query.

For example, the following queries:

SQL

SELECT a, COUNT(b) FROM S.A GROUP BY a;

SELECT a, SUM(b) FROM S.A GROUP BY a;

will generate one MAT_TABLE optimization related to table A and two MAT_AGGR optimizations, having both A as match descriptor but different aggregation functions, as shown in the following table:

ID	Query	Match descriptor	`GROUP BY` columns	Aggregation functions
1	SELECT a, COUNT(b) FROM S.A GROUP BY a	`A`	`A.a`	`COUNT(A.b)`
2	SELECT a, SUM(b) FROM S.A GROUP BY a	`A`	`A.a`	`SUM(A.b)`

If these MAT_AGGR optimizations are materialized, the corresponding materialized tables will contain only the fields reported in the GROUP BY columns and the pre-aggregated values for the aggregation functions.

For example, let us suppose that the source table A has the following data:

a	b
1	1
1	2
2	1
2	2
3	2
3	10

The materialized table related to the MAT_AGGR optimization with ID = 2 will be:

sum_A_b	A_a
3	1
3	2
12	3

If both a MAT_TABLE and a MAT_AGGR optimizations are enabled for the same source table, the optimizer automatically chooses the best materialization to be used in the rewritten query, giving preference to the MAT_AGGR type if it matches the user query.

For example, the query SELECT SUM(b) FROM A will be rewritten using the materialized table related to MAT_AGGR with ID = 2.

On the contrary, the query SELECT SUM(a) FROM A does not match any MAT_AGGR optimization since the aggregation function is not materialized in any of them. In this case, the query will be rewritten accordingly to the MAT_TABLE optimization.

Please note that in the current version, only the following functions are supported by the MAT_AGGR optimizations: SUM(x), COUNT(x), and COUNT(*).