Join Pushdown
When possible, a SQL query using joins is pushed down to the data source, and the data source evaluates the query. There are several criteria that must be met to push the joins to the data source.
- The tables in the join must be from the same data source.
- The data source driver must support joins.
The following example shows two joins from the same data source:
SELECT
soh.customerid,
sum(soh.subtotal) AS soh_subtotal
FROM
dwh.SalesOrderDetail sod
JOIN dwh.SalesOrderHeader soh
ON sod.salesorderid = soh.altsalesorderid
GROUP BY
soh.customerid
;;
When viewing the query plan, we can see how the entire query is pushed down to the data source DWH:
SELECT g_1.customerid, SUM(g_1.subtotal) FROM dwh.SalesOrderDetail AS g_0, dwh.SalesOrderHeader AS g_1 WHERE g_0.salesorderid = convert(g_1.altsalesorderid, biginteger) GROUP BY g_1.customerid
There is a special case where a view or table is materialized. In the following example, a view joins tables from two different data sources. Typically this would result in a federated query. However, because the view has been materialized, the entire query is pushed down to analytical storage. Note the background is light green, indicating that the query is pushed down to analytical storage:
CREATE VIEW "views.SalesOrdersPlusHeaders_oracle_mssql_mat" AS
SELECT
*
FROM
"oracle.SalesOrderHeader"
INNER JOIN "mssql_local_data.SalesOrderDetail"
ON "SalesOrderHeader.salesorderid" = "SalesOrderDetail.salesorderid";;
SELECT "SALESORDERID" FROM "views.SalesOrdersPlusHeaders_oracle_mssql_mat" LIMIT 500;;
SELECT g_0.salesorderid AS c_0 FROM dwh.mat_table_8_st2 AS g_0 LIMIT 500
Note that this behaviour can be overridden using the OPTION $PREFER_DWH NEVER
. The example below uses the option to force the query planner to read from the original data sources and creates a federated query (please see OPTION Clause for a more detailed explanation):
SELECT "SALESORDERID" FROM "views.SalesOrdersPlusHeaders_oracle_mssql_mat"
OPTION $PREFER_DWH NEVER
;;