Lateral Join

Do You Know The LATERAL JOIN In SQL?

Learn how to simplify complex queries, find the top N per group, and manipulate JSON data with the power of LATERAL joins.

Introduction

Have you ever been stuck trying to write a SQL query that needs to perform a sub-calculation for each row of your main table? Maybe you needed to find the last 3 blog posts for every user, or the top 5 products for each category.

These “for-each” style problems can lead to complicated, slow, and hard-to-read queries. This is the exact problem the LATERAL JOIN was designed to solve.

What Does LATERAL Actually Mean?

First, let’s clarify a critical point. While it’s common to hear “lateral join”, the true power comes from the LATERAL keyword itself. It isn’t a type of join like INNER or LEFT; it’s a modifier that fundamentally changes how a subquery in the FROM clause behaves.

Think of “lateral” as meaning “sideways.” A standard subquery in a FROM clause is executed in isolation and cannot reference other tables in the same clause. The LATERAL keyword breaks this rule. It gives a subquery permission to look sideways at columns from tables that appear before it in the FROM list.

This enables a powerful for-each loop pattern directly in your SQL.

A Real-World Business Problem (PostgreSQL)

Let’s create an e-commerce scenario with customers, products, and orders to demonstrate LATERAL joins in action using PostgreSQL.

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    amount INT,
    items JSONB, -- To store a list of products in the order
    tags VARCHAR[] -- To store tags like 'gift', 'rush_delivery'
);

-- Seed the tables with versatile data
INSERT INTO customers (customer_id, customer_name) VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie');

INSERT INTO orders (order_id, customer_id, order_date, amount, items, tags) VALUES
(101, 1, '2025-01-15', 25, '[{"product": "Mouse", "price": 25.00}]', '{"sale"}'),
(102, 1, '2025-06-20', 120,'[{"product": "Webcam", "price": 45.00}, {"product": "Keyboard", "price": 75.00}]', '{"work", "rush_delivery"}'),
(103, 1, '2025-07-01', 15,'[{"product": "Mousepad", "price": 15.00}]', '{}'),
(104, 2, '2025-02-25', 250,'[{"product": "Monitor", "price": 250.00}]', '{}'),
(105, 2, '2025-07-10', 80,'[{"product": "Keyboard", "price": 80.00}]', '{"urgent"}'),
(106, 3, '2025-04-05', 120,'[{"product": "Headphones", "price": 120.00}]', '{"gift", "rush_delivery"}');

The LATERAL Use Cases

So, let’s solve some challenges:

1- Selecting Top N Rows per Group

Goal: Find the 2 most recent orders for each customer.

When you encounter this problem, you might think of some available options. Let’s examine each option, and finally, we will use the LATERAL option.

Option 1: The Window Function Approach

This is a common way to solve top-N problems, but it requires multiple logical steps.

WITH RankedOrders AS (
    -- Step 1: Scan all orders and assign a rank to each one within its customer group
    SELECT
        c.customer_name,
        o.order_date,
        o.items,
        ROW_NUMBER() OVER(PARTITION BY c.customer_id ORDER BY o.order_date DESC) as rn
    FROM
        customers c
    JOIN
        orders o ON c.customer_id = o.customer_id
)
-- Step 2: Select from the ranked results and filter for the ones you want
SELECT
    customer_name,
    order_date,
    items
FROM
    RankedOrders
WHERE
    rn <= 2;

Why it’s less ideal: While it works perfectly, the logic is indirect. You have to create a temporary ranked dataset of all orders and then query it again.

Option 2: The Correlated Subquery Approach

A traditional correlated subquery is often used to fetch a single piece of related data.

-- This only works for a single value, not a full row.
SELECT
    c.customer_name,
    (SELECT o.order_date FROM orders o WHERE o.customer_id = c.customer_id ORDER BY o.order_date DESC LIMIT 1) AS latest_order_date
FROM
    customers c;

Why it fails for this problem: The fundamental limitation of a scalar subquery (one used in a SELECT list) is that it must return exactly one column and one row. It’s physically impossible to use this technique to fetch the two full order records (with multiple columns like order_date and items) that our problem requires. It’s the wrong tool for the job.

Option 3: The GROUP BY Approach

GROUP BY is built for aggregation, not for retrieving individual rows from a group.

-- This finds the latest date, but loses all other details of that order.
SELECT
    c.customer_name,
    MAX(o.order_date) AS latest_order_date
FROM
    customers c
JOIN
    orders o ON c.customer_id = o.customer_id
GROUP BY
    c.customer_name;

Why it fails for this problem: The entire purpose of GROUP BY is to collapse many rows into a single summary row. It can give you the latest date, the total count, or the average amount, but it discards the individual rows in the process. You cannot use it to select the “top 2” original rows from the group.

Option 4: The LATERAL Solution

SELECT
    c.customer_name,
    recent_orders.order_date,
    recent_orders.items
FROM
    customers c
LEFT JOIN LATERAL (
    SELECT
        o.order_date,
        o.items
    FROM
        orders o
    WHERE
        o.customer_id = c.customer_id
    ORDER BY
        o.order_date DESC
    LIMIT 2
) AS recent_orders ON true;

As you see, it reads like a direct translation of our request: “For each customer, get their two most recent orders.”

Tip: The ON TRUE clause is used since we’re not joining on specific columns but including the subquery’s results.

But why is it better than the Window Function approach? The LATERAL approach feels more natural because it integrates the filtering (LIMIT 2) directly into the “for-each-customer” step, avoiding the separate ranking-then-filtering process.

Result:

customer_nameorder_dateitems
Alice2025-07-01[{"product": "Mousepad", "price": 15.00}]
Alice2025-06-20[{"product": "Webcam", "price": 45.00}, {"product": "Keyboard", "price": 75.00}]
Bob2025-07-10[{"product": "Keyboard", "price": 80.00}]
Bob2025-02-25[{"product": "Monitor", "price": 250.00}]
Charlie2025-04-05[{"product": "Headphones", "price": 120.00}]

2- Unpacking JSONB Order Details

Goal: Find all customers who have ever ordered a ‘Keyboard’.

SELECT DISTINCT -- Use DISTINCT to only list each customer once
    c.customer_name
FROM
    orders o
JOIN
    customers c ON o.customer_id = c.customer_id
-- For each order 'o', expand its items into a table called 'item_details'
JOIN LATERAL jsonb_to_recordset(o.items) AS item_details(product TEXT, price DECIMAL) ON true
WHERE
    item_details.product = 'Keyboard';

The function jsonb_to_recordset turns a JSON array into a virtual table. LATERAL lets us run this function for every single order row.

Result:

customer_name
Alice
Bob

💌 Enjoying this post? Subscribe to get more practical dev tips right in your inbox.

3- Searching Array Data With UNNEST

Goal: Find the names and order dates for all orders tagged with ‘rush_delivery’.

SELECT
    c.customer_name,
    o.order_date,
    tag_value
FROM
    orders o
JOIN
    customers c ON o.customer_id = c.customer_id
-- For each order 'o', expand its tags array into a single-column table
, LATERAL unnest(o.tags) AS tag_value
WHERE
    tag_value = 'rush_delivery';

The unnest function expands an array into a set of rows. LATERAL lets us do this for each order’s tags array.

However, PostgreSQL allows you to omit LATERAL in some cases implicitly. When you use a set-returning function like unnest() in the FROM clause with a table column as argument (e.g., o.tags), PostgreSQL treats it as a LATERAL join implicitly. So, in this case you can remove LATERAL and the query will work; but it is best practice to keep

Tip: Keep in mind that the , is a shorthand for CORSS JOIN; which means it should be CROSS JOIN LATERAL.

Result:

customer_nameorder_datetag_value
Alice2025-06-20rush_delivery
Charlie2025-04-05rush_delivery

4- Using Functions and Calculations per Row

Goal: Apply a 10% discount if the order amount exceeds 100.

SELECT 
    c.customer_name,
    o.amount,
    d.discount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
CROSS JOIN LATERAL (
    SELECT CASE 
        WHEN o.amount > 100 THEN o.amount * 0.1 
        ELSE 0 
    END AS discount
) d;

Result:

customer_nameamountdiscount
Alice250
Alice12012
Alice150
Bob25025
Bob800
Charlie12012

LATERAL JOIN vs. Correlated Subqueries

Correlated subqueries and LATERAL JOINs in SQL both allow a subquery to reference columns from a preceding table; however, they differ in their capabilities.

Correlated Subqueries:

  • A traditional correlated subquery must return a single scalar value (one column, one row) for each row processed by the outer query.
  • They are typically found in the SELECT list, WHERE clause, or HAVING clause.
  • It runs repeatedly for each row, which might lead to scanning the target table multiple times.

LATERAL JOIN:

  • LATERAL JOIN can return multiple columns and multiple rows from the subquery for each row of the outer query.
  • It is placed in the FROM clause, acting as a join between the outer table and the result of the LATERAL subquery.
  • Allows PostgreSQL to plan the subquery execution more efficiently, potentially reducing table scans.

When to Avoid LATERAL?

An expert knows when not to use a tool. LATERAL is a specialized solution, and using it for simpler problems is inefficient and overly complex. Let’s introduce some use cases.

Case 1: For the “Top-1” Problem, Use DISTINCT ON

If you only need the single most recent record, DISTINCT ON is purpose-built, more declarative, and often faster.

-- Better way to get ONLY the single most recent order
SELECT DISTINCT ON (c.customer_id)
    c.customer_name,
    o.order_date
FROM
    customers c
JOIN
    orders o ON c.customer_id = o.customer_id
ORDER BY
    c.customer_id, o.order_date DESC;

Why it’s better: DISTINCT ON is the idiomatic PostgreSQL solution for the “top-1” problem. Its intent is clearer, and the query planner can often optimize it more effectively.

Case 2: For Simple Aggregates, Use GROUP BY

If you only need a summary calculation (COUNT, SUM, AVG) and not the details of the individual rows, GROUP BY is the correct, standard tool.

-- Get the total number of orders for each customer
SELECT
    c.customer_name,
    COUNT(o.order_id) AS total_orders
FROM
    customers c
JOIN
    orders o ON c.customer_id = o.customer_id
GROUP BY
    c.customer_name;

Why it’s better: This is the most fundamental and efficient way to perform aggregations. Using LATERAL to count rows would be a convoluted anti-pattern.

Performance Considerations

LATERAL can improve performance over correlated subqueries because PostgreSQL’s query planner can optimize them better. However:

  • Index Usage: Ensure indexes exist on join conditions (e.g., customer_id in the orders table).
  • Data Volume: For large datasets, test performance to avoid excessive subquery executions.
  • Query Complexity: Complex LATERAL subqueries may still be resource-intensive if not optimized.

Tip: Use EXPLAIN to analyze the query plan and ensure indexes are used effectively.

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Conclusion

LATERAL joins in PostgreSQL are a game-changer for handling complex queries with clarity and efficiency. From extracting JSONB data to selecting top N rows per group, they offer flexibility that traditional joins and correlated subqueries often lack.

By understanding when to use LATERAL joins, optimizing performance with indexes, and avoiding overuse in simple scenarios, you can unlock their full potential.

Try experimenting with LATERAL in your next PostgreSQL project, and see how it simplifies your data challenges!

If you’re interested in exploring more, check out these articles.

Want more dev insights like this? Subscribe to get practical tips, tutorials, and tech deep dives delivered to your inbox. No spam, unsubscribe anytime.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top