Federated Optimizations
Pushdown
In federated database systems, pushdown decomposes the user-level query into source queries that perform as much work as possible on their respective source system. Pushdown analysis requires knowledge of source system capabilities, which is provided to the CData Virtuality Server through the Connector API. Any work not performed at the source is then processed in Federate's relational engine.
Based on capabilities, CData Virtuality Server will manipulate the query plan to ensure that each source performs as much joining, filtering, grouping, etc., as possible. Planning combines Standard Relational Techniques and cost-based heuristics for pushdown optimization in many cases, such as with join ordering.
Criteria and join push down are typically the most important aspects of the query to push down when performance is a concern. See Query Plans on how to read a plan to ensure that source queries are as efficient as possible.
Dependent Joins
A special optimization called a dependent join reduces the rows returned from one of the two relations involved in a multi-source join. In a dependent join, queries are issued to each source sequentially rather than in parallel, with the results obtained from the first source used to restrict the records returned from the second. Dependent joins can perform some joins much faster by drastically reducing the amount of data retrieved from the second source and the number of join comparisons that must be performed.
The query planner determines the conditions when a dependent join is used based on hints and costing information.
The CData Virtuality Server supports hints to control dependent join behaviour:
MAKEIND
- indicates that the clause should be the independent side of a dependent join;MAKEDEP
- indicates that the clause should be the dependent side of a join;MAKENOTDEP
- prevents the clause from being the dependent side of a join. These can be placed in either theOPTION Clause
Clause or directly in theFROM Clause
Clause. As long as all Access Patterns can be met, theMAKEIND
,MAKEDEP
, andMAKENOTDEP
hints override any use of costing information.MAKENOTDEP
supersedes the other hints.The
MAKEDEP
/MAKEIND
hint should only be used if the proper query plan is not chosen by default. You should ensure that your costing information represents the source cardinality. An inappropriateMAKEDEP
/MAKEIND
hint can force an inefficient join structure and may result in many source queries.The engine will for
IN
clauses to filter the values coming from the dependent side. If the number of values from the independent side exceeds the translator'sMaxInCriteriaSize
, the values will be split into multiple IN predicates up toMaxDependentPredicates
. Multiple dependent queries will be issued in parallel when the number of independent values exceeds MaxInCriteriaSize*MaxDependentPredicates.
Copy Criteria
Copy criteria is an optimization that creates additional predicates based upon combining join and where clause criteria. For example, equijoin predicates (source1.table.column
= source2.table.column
) are used to create new predicates by substituting source1.table.column
for source2.table.column
and vice versa. In a cross-source scenario, this allows for where criteria applied to a single side of the join to be applied to both source queries
Projection Minimization
The CData Virtuality Server ensures that each pushdown query only projects the symbols required for processing the user query. This is especially helpful when querying through large intermediate view layers.
Partial Aggregate Pushdown
Partial aggregate pushdown allows grouping operations above multi-source joins and unions to be decomposed so that some grouping and aggregate functions may be pushed down to the sources.
Optional Join
The optional join hint indicates to the optimizer that a joined table should be omitted if none of its columns is used by the output of the user query or in a meaningful way to construct the results of the user query. This hint is typically only used in view layers containing multi-source joins.
The optional join hint is applied as a comment on a join clause. It can be applied in both ANSI and non-ANSI joins. With non-ANSI joins, an entire joined table may be marked as optional. Here is an example:
SELECT a.column1, b.column2 FROM sa.a, /*+ optional */ sb.b WHERE a.key = b.key
Suppose this example defines a view layer X
. If X is queried so as not to need b.column2
, the optional join hint will cause b to be omitted from the query plan. The result would be the same as if X
were defined as:
SELECT a.column1 from s.a
Here is another example:
SELECT a.column1, b.column2, c.column3 FROM /*+ optional */ (sa.a INNER JOIN sb.b ON a.key = b.key) INNER JOIN sc.c ON a.key = c.key
In this example, the ANSI join syntax allows for the join of a
and b
to be marked as optional. Suppose this example defines a view layer X. Only if both a.column1
and b.column2
are not needed, e.g. SELECT column3 FROM X
, will the join be removed.
The optional join hint will not remove a bridging table that is still required:
SELECT a.column1, b.column2, c.column3 FROM /*+ optional */ sa.a, sb.b, sc.c WHERE ON a.key = b.key AND a.key = c.key
Suppose this example defines a view layer X
. If b.column2
or c.column3
are solely required by a query to X
, the join on a
be removed. However, if a.column1
or both b.column2
and c.column3
are needed, the optional join hint will not take effect.
The relevant criteria are not applied when a join clause is omitted via the optional join hint. Thus it is possible that the query results may not have the same cardinality or even the same row values as when the join is fully applied.
Left/right outer joins where the inner side values are not used and whose rows undergo a distinct operation will automatically be treated as an optional join and do not require a hint. Here is an example:
SELECT a.column1, b.column2 FROM sa.a LEFT OUTER JOIN /*+optional*/ sb.b ON a.key = b.key
- Be sure that there is no whitespace between /* and + when using e.g.
/*+ optional */
; - Configure your SQL client not to remove multi-line comments.
Here is how to do this in Squirrel: Session -> Session Properties -> SQL- > Remove multi-line comment ( /* ... */ )
A simple SELECT COUNT(*) FROM VIEW
against a view where all join tables are marked as optional will not return a meaningful result.
Partitioned Union (Partition Pruning)
Union partitioning is inferred from the transformation/inline view. If one (or more) of the UNION
columns are defined by constants and/or has WHERE
clause IN
predicates containing only constants that make each branch mutually exclusive, then the UNION
is considered partitioned. UNION ALL
must be used, and the UNION
cannot have a LIMIT
, WITH
, or ORDER BY
clause (although individual branches may use LIMIT
, WITH
, or ORDER BY
). Partitioning values should not be null. For example, the view definition SELECT 1 AS x, y FROM foo UNION ALL SELECT z, a FROM foo1 WHERE z IN (2, 3)
would be considered partitioned on column x, since the first branch can only be the value 1
and the second branch can only be the values 2
or 3
. Note that more advanced or explicit partitions could be considered in the future. The concept of a partitioned union is used for partition-wise joins and Partial Aggregate Pushdown.
Standard Relational Techniques
The CData Virtuality Server also incorporates many standard relational techniques to ensure efficient query plans.
- Rewrite analysis for function simplification and evaluation;
- Boolean optimizations for basic criteria simplification;
- Removal of unnecessary view layers;
- Removal of unnecessary sort operations;
- Advanced search techniques through the left-linear space of join trees;
- Parallelizing of source access during execution;
- Subquery Optimization.