Optimization Subsystem
When the CData Virtuality Server processes a query and finds that it can be optimized, it recommends an optimization by creating a new row in the SYSADMIN.RecommendedOptimizations
table. If you want to accept the recommended optimization, you can use the enableOptimization
stored procedure.
Materialization must be renewed if the structure or column names of the underlying sources (tables and views the optimization refers to) have changed. Therefore, the enableOptimization
procedure can be used with the materializeNow
parameter set to TRUE
.
Queries against the SYS
, SYSADMIN
, and pg_catalog
system schemas will not result in an optimization.
Currently, the CData Virtuality Server supports three different types of optimizations: MAT_TABLE
, MAT_JOIN
, and MAT_AGGR
. They are automatically generated by the system depending on the query executed by the user. Each recommended optimization is identified by a match descriptor representing the source tables/views or the joins the system can materialize to optimize the user query.
Type | Description |
---|---|
MAT_TABLE | Recommended optimization related to a single source table or view (e.g. generated by a query like SELECT * FROM A ) |
MAT_JOIN | Recommended optimization related to one or more joins between two or more source tables or views (e.g. generated by a query like SELECT * FROM A JOIN B ON A.a = B.b ) |
MAT_AGGR | Recommended optimization related to pre-aggregated tables (e.g. generated by a query like SELECT a, COUNT(b) FROM A GROUP BY a ) |
Optimizations of the MAT_TABLE
and MAT_JOIN
types are uniquely identified by their match descriptors, and, if enabled, they replicate all the columns defined in the source tables or views. Optimizations of the MAT_AGGR
type are not uniquely identified by its match descriptors, but they differ on the list of "GroupBy columns" and "Aggregation functions" used in the original user query.
For example, the following queries:
SELECT a, COUNT(b) FROM S.A GROUP BY a;
SELECT a, SUM(b) FROM S.A GROUP BY a;
will generate one MAT_TABLE
optimization related to table A and two MAT_AGGR
optimizations, having both A
as match descriptor but different aggregation functions, as shown in the following table:
ID | Query | Match descriptor | GROUP BY columns | Aggregation functions |
---|---|---|---|---|
1 | SELECT a, COUNT(b) FROM S.A GROUP BY a | A | A.a | COUNT(A.b) |
2 | SELECT a, SUM(b) FROM S.A GROUP BY a | A | A.a | SUM(A.b) |
If these MAT_AGGR
optimizations are materialized, the corresponding materialized tables will contain only the fields reported in the GROUP BY
columns and the pre-aggregated values for the aggregation functions.
For example, let us suppose that the source table A
has the following data:
a | b |
---|---|
1 | 1 |
1 | 2 |
2 | 1 |
2 | 2 |
3 | 2 |
3 | 10 |
The materialized table related to the MAT_AGGR
optimization with ID = 2
will be:
sum_A_b | A_a |
---|---|
3 | 1 |
3 | 2 |
12 | 3 |
If both a MAT_TABLE
and a MAT_AGGR
optimizations are enabled for the same source table, the optimizer automatically chooses the best materialization to be used in the rewritten query, giving preference to the MAT_AGGR
type if it matches the user query.
For example, the query SELECT SUM(b) FROM A
will be rewritten using the materialized table related to MAT_AGGR
with ID = 2
.
On the contrary, the query SELECT SUM(a) FROM A
does not match any MAT_AGGR
optimization since the aggregation function is not materialized in any of them. In this case, the query will be rewritten accordingly to the MAT_TABLE
optimization.
Please note that in the current version, only the following functions are supported by the MAT_AGGR
optimizations: SUM(x)
, COUNT(x)
, and COUNT(*)
.