When you build reports in SQL, you often need to count or average values based on a condition, like counting only completed orders or averaging amounts for canceled ones.
For years, SQL developers relied on CASE
statements or complex Common Table Expressions (CTEs)
to handle these scenarios. They work, but it gets messy fast.
There’s a better way.
PostgreSQL (and some other databases) support the FILTER
clause, a clean and efficient feature for conditional aggregation. In this guide, you’ll learn how to use it to simplify your queries, boost performance, and avoid common mistakes.
Imagine you run an online store. To understand your business’s health, you need a monthly report that shows:
- The total revenue from all orders.
- The revenue generated specifically from “completed” orders.
- The revenue lost from “cancelled” orders.
Getting this data in one clean report requires us to aggregate sums based on the status of each order.
First, let’s create the tables we need. We’ll have a customers
table and an orders
table.
-- Create a table for our customers
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
join_date DATE
);
-- Create a table for orders
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
order_status VARCHAR(20), -- e.g., 'completed', 'pending', 'cancelled'
amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- Insert some sample data to work with
INSERT INTO customers (customer_id, customer_name, join_date) VALUES
(1, 'John Smith', '2023-01-15'),
(2, 'Jane Doe', '2023-02-20');
INSERT INTO orders (order_id, customer_id, order_date, order_status, amount) VALUES
(101, 1, '2024-05-05', 'completed', 150.00),
(102, 2, '2024-05-10', 'completed', 200.50),
(103, 1, '2024-05-12', 'cancelled', 75.25),
(104, 2, '2024-06-01', 'completed', 300.00),
(105, 1, '2024-06-03', 'pending', 50.00),
(106, 2, '2024-06-08', 'cancelled', 120.75),
(107, 1, '2024-06-15', 'completed', 250.00);
Our goal is to create the following monthly summary from this data.
month | total_revenue | completed_revenue | cancelled_revenue |
---|---|---|---|
2024-05 | 425.75 | 350.50 | 75.25 |
2024-06 | 720.75 | 550.00 | 120.75 |
The most common method for PostgreSQL conditional count and sum was the CASE
statement inside an aggregate function.
-- This query uses CASE to conditionally sum the amounts.
-- It works, but it can get messy.
SELECT
TO_CHAR(order_date, 'YYYY-MM') AS month,
-- Sum all orders for total revenue
SUM(amount) AS total_revenue,
-- Only sum if the status is 'completed'
SUM(CASE WHEN order_status = 'completed' THEN amount ELSE 0 END) AS completed_revenue,
-- Only sum if the status is 'cancelled'
SUM(CASE WHEN order_status = 'cancelled' THEN amount ELSE 0 END) AS cancelled_revenue
FROM
orders
GROUP BY
month
ORDER BY
month;
This query gets the job done. However, for every conditional sum, you have a CASE...WHEN...THEN...ELSE...END
block. If you have many conditions, the SELECT
list becomes long, hard to read, and difficult to maintain.
Another approach is to use CTEs to pre-aggregate each status separately and then join them.
-- This query uses CTEs, which is even more verbose.
-- It breaks the logic into steps, but is much longer.
WITH monthly_totals AS (
SELECT
TO_CHAR(order_date, 'YYYY-MM') AS month,
SUM(amount) AS total_revenue
FROM orders
GROUP BY month
),
completed AS (
SELECT
TO_CHAR(order_date, 'YYYY-MM') AS month,
SUM(amount) AS completed_revenue
FROM orders
WHERE order_status = 'completed'
GROUP BY month
),
cancelled AS (
SELECT
TO_CHAR(order_date, 'YYYY-MM') AS month,
SUM(amount) AS cancelled_revenue
FROM orders
WHERE order_status = 'cancelled'
GROUP BY month
)
SELECT
mt.month,
mt.total_revenue,
COALESCE(c.completed_revenue, 0) AS completed_revenue,
COALESCE(cn.cancelled_revenue, 0) AS cancelled_revenue
FROM
monthly_totals mt
LEFT JOIN completed c ON mt.month = c.month
LEFT JOIN cancelled cn ON mt.month = cn.month
ORDER BY
mt.month;
This approach is extremely verbose. It requires multiple scans of the orders
table (though the optimizer might fix this) and complex joins. It’s simply too much code for such a common task.
Now, let’s solve the same problem using the FILTER
clause.
-- This is the modern, clean, and efficient way.
-- Notice how much easier it is to read.
SELECT
TO_CHAR(order_date, 'YYYY-MM') AS month,
-- Sum all orders for total revenue
SUM(amount) AS total_revenue,
-- The FILTER clause applies a WHERE condition to the aggregation
SUM(amount) FILTER (WHERE order_status = 'completed') AS completed_revenue,
SUM(amount) FILTER (WHERE order_status = 'cancelled') AS cancelled_revenue
FROM
orders
GROUP BY
month
ORDER BY
month;
The result is identical, but the query is dramatically cleaner. The FILTER (WHERE ...)
syntax is attached directly to the SQL aggregate function FILTER
, making the intent crystal clear: “Sum the amount, but only for rows that meet this condition.” This makes it perfect for creating SQL pivot with filter-like reports.
Readability is a huge win, but what about performance?
In the Postgres FILTER vs CASE debate, according to jOOQ’s benchmark, FILTER
often has a performance edge. While logically similar, FILTER
provides a more direct instruction to the PostgreSQL query planner.
Why? The CASE
statement is a general-purpose expression that can be used anywhere, so the optimizer has to work harder to understand its context within an aggregate function. The FILTER
clause, however, was designed specifically for this purpose, allowing for a more direct and optimized execution path.
When it comes to CTE vs FILTER, the choice is even clearer for conditional aggregation. The CTE approach requires multiple subqueries and joins, which adds significant overhead both in terms of code length and planner complexity. The FILTER
clause accomplishes the same goal in a single, streamlined scan of the table, making it vastly more efficient and maintainable.
What if you need to check multiple criteria? The FILTER
clause supports standard AND
and OR
logic inside its WHERE
clause, making a PostgreSQL filter multiple conditions check straightforward.
-- Example of filtering on multiple conditions
SELECT
COUNT(*) FILTER (WHERE order_status = 'completed' AND amount > 200) AS large_completed_orders,
COUNT(*) FILTER (WHERE order_status = 'pending' OR order_status = 'shipped') AS in_progress_orders
FROM
orders;
The PostgreSQL FILTER
clause is more than just syntactic sugar. It is a powerful tool for writing cleaner, more readable, and more performant SQL. By allowing you to apply specific conditions directly to aggregate functions, it solves the problem of PostgreSQL’s conditional aggregation in a way that is far superior to older methods.
For any data analyst or developer working with PostgreSQL, mastering the FILTER
clause is essential. It simplifies complex reports, speeds up queries, and makes your SQL code a joy to read and maintain. The next time you find yourself writing a CASE
statement inside a SUM()
or COUNT()
, give FILTER
a try. You won’t look back.
Finally, if you found this article helpful, you can check more here:
- SQL COALESCE in Postgres: A Simple Guide
- Facade vs Proxy vs Adapter Design Patterns
- TypeScript Type Vs Interface? The Answer Is Type!
Want more dev insights like this? Subscribe to get practical tips, tutorials, and tech deep dives delivered to your inbox. No spam, unsubscribe anytime.