Overview

You are looking at an older version of the documentation. The latest version is found here.

When the query engine receives an incoming SQL query, it performs the following operations:

Parsing - validating syntax and converting to internal form
Resolving - linking all identifiers to metadata and functions to the function library
Validating - validating SQL semantics based on metadata references and type signatures
Rewriting - rewriting SQL to simplify expressions and criteria
Logical plan optimization - converting the rewritten canonical SQL into a logical plan for in-depth optimization. The Data Virtuality Server optimizer is predominantly rule-based. A certain rule set will be applied based on the query structure and hints. These rules may, in turn, trigger the execution of more rules. The Data Virtuality Server also takes advantage of costing information within several rules. The logical plan optimization steps are described in the Query Planner section.
Processing plan conversion - converting the logic plan into an executable form where the nodes represent basic processing operations. The final processing plan is displayed as the Query Plan.

The logical query plan is a tree of operations that transform data in source tables to the expected result set. Data flows from the bottom (tables) to the top (output) in the tree. The primary logical operations are select (select or filter rows based on criteria), project (project or compute column values), join, source (retrieve data from a table), sort (ORDER BY), duplicate removal (SELECT DISTINCT), group (GROUP BY), and union (UNION).

For example, consider the following query that retrieves all engineering employees born since 1970:

SQL

SELECT e.title, e.lastname FROM Employees AS e JOIN Departments AS d ON e.dept_id = d.dept_id WHERE year(e.birthday) >= 1970 AND d.dept_name = 'Engineering'

Logically, the data from the Employees and Departments tables are retrieved, joined, filtered as specified, and finally, the output columns are projected. The canonical query plan thus looks like this:

Data flows from the tables at the bottom upwards through the join, select, and finally, the project to produce the final results. The data passed between each node is logically a result set with columns and rows.

Of course, this is what happens logically, not how the plan is executed. Starting from this initial plan, the query planner performs transformations on the query plan tree to produce an equivalent plan that retrieves the same results faster. Both a federated query planner and a relational database planner deal with the same concepts and many of the same plan transformations. In this example, the criteria on the Departments and Employees tables will be pushed down the tree to filter the results as early as possible.

In both cases, the goal is to retrieve the query results in the fastest possible time. However, the relational database planner does this primarily by optimizing the access paths in pulling data from storage.

In contrast, a federated query planner is less concerned about storage access because it typically pushes that burden to the data source. The most important consideration for a federated query planner is minimizing data transfer.