When SQL Join Order Does & Doesn’t Matter
Understanding the types of SQL joins is the first step to tackling most data problems.
However, there are situations where the correct type of join still produces the wrong results when the JOIN
clause or its tables are in the wrong order.
For left and right joins, the order of input tables matters. For inner, full, and cross joins, the order of input tables does not matter. The same principles also apply when you join the results of other joins.
All examples will refer to the following tables:
EMPLOYEE
name | role_id |
---|---|
Bob | 10 |
Jane | 20 |
Andrew | 20 |
Susan | 30 |
New hire TBD | (null) |
(null) | 100 |
ROLE
role_id | title |
---|---|
10 | Sales Rep |
20 | Accountant |
90 | President |
100 | (null) |
PAYROLL
employee_name | role_id | pay_date | amount |
---|---|---|---|
Bob | 10 | 2022-01-01 | 4000 |
Bob | 10 | 2022-02-01 | 4000 |
Bob | 10 | 2022-03-01 | 4500 |
Jane | 20 | 2022-01-01 | 6000 |
Jane | 20 | 2022-02-01 | 6000 |
Jane | 20 | 2022-03-01 | 6000 |
What exactly is join order?
In SQL databases, “join order” is the order of tables, views, or other inputs within a JOIN
clause. Inputs can even include the results of other JOIN
clauses, so nested joins are common (but best avoided when possible).
For example, we have to think about the order of tables* within one JOIN
clause, e.g.,
table_a LEFT JOIN table_b
versus
table_b LEFT JOIN table_a
.
Sometimes, we also have to think about the order of nested joins as well, as in:
(table_a JOIN table_b) LEFT JOIN (table_c JOIN table_d)
.
From here on out, this article will refer to any join input as a “table” for conciseness. Depending on the context, an input could also be a view, join result, function, procedure, operator, or anything else that produces a table-like result.
LEFT & RIGHT JOIN: order matters
Left and right outer joins return all results from their “base” table (the first one in a left join or the second one in a right join) plus any matches from their “look-up” table. Consequently, the order of tables determines which serves a base or look-up role.
In effect, it’s the difference between “all of A, plus whatever matches in B” as opposed to “all of B, plus whatever matches in A.”
For example, note that some (but not all) employees have a role, and some but not all roles are associated with an employee. Consequently, we get different results when we change the order of EMPLOYEE and ROLE within the join.
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM employee e
LEFT JOIN role r
ON r.role_id = e.role_id
name | e_role_id | title | r_role_id |
---|---|---|---|
Bob | 10 | Sales Rep | 10 |
Jane | 20 | Accountant | 20 |
Andrew | 20 | Accountant | 20 |
Susan | 30 | null | null |
New hire TBD | null | null | null |
null | 100 | null | 100 |
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM role r
LEFT JOIN employee e
ON r.role_id = e.role_id
name | e_role_id | title | r_role_id |
---|---|---|---|
Bob | 10 | Sales Rep | 10 |
Jane | 20 | Accountant | 20 |
Andrew | 20 | Accountant | 20 |
null | null | President | 90 |
null | 100 | null | 100 |
These examples use a left join, which is by far the most common and useful outer join type. A right join behaves similarly, but its “base” table is the second table, not the first.
INNER, FULL, & CROSS JOIN: order does not matter
Inner joins return only matching rows, so the order of tables does not matter. “Rows in both A and B” is equivalent to “rows in both B and A.”
For example, the following queries are logically identical and therefore return the same results.
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM employee e
INNER JOIN role r
ON r.role_id = e.role_id
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM role r
INNER JOIN employee e
ON r.role_id = e.role_id
name | e_role_id | title | r_role_id |
---|---|---|---|
Bob | 10 | Sales Rep | 10 |
Jane | 20 | Accountant | 20 |
Andrew | 20 | Accountant | 20 |
null | 100 | null | 100 |
Full outer joins return all results from both tables, including any matches, so their order does not matter. “All of A and all of B” is the same thing as “all of B and all of A.”
Consequently, these two queries also mean and produce the same thing:
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM employee e
FULL JOIN role r
ON r.role_id = e.role_id
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM role r
FULL JOIN employee e
ON r.role_id = e.role_id
name | e_role_id | title | r_role_id |
---|---|---|---|
Bob | 10 | Sales Rep | 10 |
Jane | 20 | Accountant | 20 |
Andrew | 20 | Accountant | 20 |
Susan | 30 | null | null |
New hire TBD | null | null | null |
null | 100 | null | 100 |
null | null | President | 90 |
Finally, a cross join doesn’t even check for matches, but simply returns all combinations of rows. Just like 3 x 2 equals 2 x 3, so “all combinations of A and B” also equals “all combinations of B and A.”
For instance:
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM employee e
CROSS JOIN role r
SELECT
e.name
,e.role_id AS e_role_id
,r.title
,r.role_id AS r_role_id
FROM role r
CROSS JOIN employee e
name | e_role_id | title | r_role_id |
---|---|---|---|
Bob | 10 | Sales Rep | 10 |
Jane | 20 | Sales Rep | 10 |
Andrew | 20 | Sales Rep | 10 |
Susan | 30 | Sales Rep | 10 |
New hire TBD | null | Sales Rep | 10 |
null | 100 | Sales Rep | 10 |
Bob | 10 | Accountant | 20 |
Jane | 20 | Accountant | 20 |
Andrew | 20 | Accountant | 20 |
Susan | 30 | Accountant | 20 |
New hire TBD | null | Accountant | 20 |
null | 100 | Accountant | 20 |
Bob | 10 | President | 90 |
Jane | 20 | President | 90 |
Andrew | 20 | President | 90 |
Susan | 30 | President | 90 |
New hire TBD | null | President | 90 |
null | 100 | President | 90 |
Bob | 10 | null | 100 |
Jane | 20 | null | 100 |
Andrew | 20 | null | 100 |
Susan | 30 | null | 100 |
New hire TBD | null | null | 100 |
null | 100 | null | 100 |
What about multiple JOIN clauses?
The order of multiple independent join statements does not matter.
For example, let’s say we want to list every employee’s name
, then add any available title
, pay_date
, and amount
details. In other words, the ROLE and PAYROLL joins will both depend on EMPLOYEE but will not depend on each other, so we’re therefore free to switch them.
SELECT
e.name
,r.title
,p.pay_date
,p.amount
FROM employee e
LEFT JOIN role r
ON r.role_id = e.role_id
LEFT JOIN payroll p
ON p.employee_name = e.name
SELECT
e.name
,r.title
,p.pay_date
,p.amount
FROM employee e
LEFT JOIN payroll p
ON p.employee_name = e.name
LEFT JOIN role r
ON r.role_id = e.role_id
name | title | pay_date | amount |
---|---|---|---|
Bob | Sales Rep | 2022-01-01 | 4000 |
Bob | Sales Rep | 2022-02-01 | 4000 |
Bob | Sales Rep | 2022-03-01 | 4500 |
Jane | Accountant | 2022-01-01 | 6000 |
Jane | Accountant | 2022-02-01 | 6000 |
Jane | Accountant | 2022-03-01 | 6000 |
Andrew | Accountant | null | null |
Susan | null | null | null |
New hire TBD | null | null | null |
null | null | null | null |
Remember, we are talking about the order of the entire LEFT JOIN
clauses. We cannot freely change the order of tables within each clause, as we saw above.
The order of multiple dependent join statements does matter. Each join clause may only refer to tables that have been introduced in or above itself.
For example, let’s say we still want every employee’s name
, pay_date
, and salary amount
, but we only want title
for folks with a payment record.
As always, there are multiple ways to achieve this, but the following query does it solely with joins. ROLE now joins back to PAYROLL, not to EMPLOYEE like it did earlier. That means we can’t find a match in ROLE without first finding a match in PAYROLL.
SELECT
e.name
,r.title
,p.pay_date
,p.amount
FROM employee e
LEFT JOIN payroll p
ON p.employee_name = e.name
LEFT JOIN role r
ON r.role_id = p.role_id
name | title | pay_date | amount |
---|---|---|---|
Bob | Sales Rep | 2022-01-01 | 4000 |
Bob | Sales Rep | 2022-02-01 | 4000 |
Bob | Sales Rep | 2022-03-01 | 4500 |
Jane | Accountant | 2022-01-01 | 6000 |
Jane | Accountant | 2022-02-01 | 6000 |
Jane | Accountant | 2022-03-01 | 6000 |
Andrew | null | null | null |
Susan | null | null | null |
New hire TBD | null | null | null |
null | null | null | null |
ROLE now depends on PAYROLL, so the join to ROLE must come after the join to PAYROLL. If we switch the order of the joins:
SELECT
e.name
,r.title
,p.pay_date
,p.amount
FROM employee e
LEFT JOIN role r
ON r.role_id = p.role_id
LEFT JOIN payroll p
ON p.employee_name = e.name
we get the following error:
The multi-part identifier "p.role_id" could not be bound.
Summary: SQL join order matters…sometimes
In SQL, the importance of join order depends on the type of join.
Inside a single JOIN
statement:
- Table order does not affect full, inner, or cross joins.
- Table order affects left and right joins.
Across multiple JOIN
statements:
- Statement order does not matter when neither statement refers to the other.
- Statement order does matter when one statement refers to the other. The referenced statement(s) must be above the referencing statement.