Don't use SQL `OFFSET` for pagination
Mar 27 2023
You might already know about the problems with using SQL OFFSET
and LIMIT
for pagination, or if you use
some modern ORM you might already be protected from this by default. This has been known for a while but I only heard about this recently and you might also have not yet.
In addition, at my last workplace nearly all paginated actions were
implemented using OFFSET
so it might be worth repeating. Furthermore, if someone starts learning SQL it comes naturally that OFFSET
would
be used for pagination and most learning examples also show such a use case.
Therefore this post is a quick reminder to refrain from
doing so and why and concerns most database engines such as MYSQL, Postgres, SQLServer.
Have a look at this excellent article as well https://use-the-index-luke.com/no-offset).
TLDR: Offset pagination takes increasingly more time when pagination gets larger and leads to inconsistencies.
LIMIT-OFFSET
pagination aka rowset pagination
Usually, a paginated query using LIMIT
and OFFSET
would look something like this:
SELECT id
FROM users
ORDER BY lastname, firstname, id
OFFSET 20
LIMIT 10
This would fetch the next ten entries after having fetched 20 already beforehand. Here we use the id for ordering but you might
use a date_created
or any other list of fields for ordering your rows.
The larger the offset gets, the more time this query will need to fetch the same amount of items which comes as a surprise.
Note that LIMIT
is not in the ansi-sql standard, you can use FETCH FIRST <n> ROWS ONLY
as well.
Keyset / cursor pagination
The better approach is to use the so-called keyset/cursor pagination. Instead of specifying an offset we use the last entry (or some of its properties) within the current pagination window and get all rows that come after the last row considering the ordering. We jump directly to the key using an index and then fetch the next n items. It is important to have an ordering as well as a unique key that will be used for finding the last entry from the current page shown.
If you order by other fields such as lastname
, firstname
you would include those in the WHERE
clause as well. For example:
And if you order descending you would just reverse the inequality.
SELECT id
FROM users
WHERE (lastname, firstname, id) > (:last_lastname, :last_firstname, :last_id)
ORDER BY lastname, firstname, id ASC
LIMIT 10
Performance Problems
If you paginate with large OFFSET
values all previous rows have to be scanned anyway and are then dropped.
This becomes a problem with large OFFSET
values as your database engine will do unnecessary work and throw away
the data afterward.
Below I show a few queries with a user table containing 100’000 entries. With help of EXPLAIN ANALYZE
from Postgres we can see
how long the query takes and how many rows are considered.
Keep in mind that the actual time for running the query depends heavily on your hardware as well
as the current load on my system.
Set up the table and populate it with 100’000 entries:
CREATE TABLE users (id BIGSERIAL PRIMARY KEY);
INSERT INTO users SELECT FROM generate_series(1,100000);
Then we get the first 10 entries with an OFFSET
of 0.
test_offset# EXPLAIN ANALYZE SELECT id FROM users ORDER BY id ASC LIMIT 10 OFFSET 0;
QUERY PLAN
═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Limit (cost=0.29..0.55 rows=10 width=8) (actual time=0.090..0.095 rows=10 loops=1)
-> Index Only Scan using users_pkey on users (cost=0.29..2604.29 rows=100000 width=8) (actual time=0.088..0.091 rows=10 loops=1)
Heap Fetches: 0
Planning Time: 0.094 ms
Execution Time: 0.173 ms
(5 rows)
This takes 0.173ms
for execution. And only the first 10 rows are considered for this query.
The second query will have an offset of 10'000
entries but still fetch only 10 as the previous query.
test_offset# EXPLAIN ANALYZE SELECT id FROM users ORDER BY id ASC LIMIT 10 OFFSET 10000;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Limit (cost=260.69..260.95 rows=10 width=8) (actual time=3.572..3.578 rows=10 loops=1)
-> Index Only Scan using users_pkey on users (cost=0.29..2604.29 rows=100000 width=8) (actual time=0.094..2.743 rows=10010 loops=1)
Heap Fetches: 0
Planning Time: 0.108 ms
Execution Time: 3.624 ms
(5 rows)
This now takes 3.624ms
instead of 0.173ms
even though we only still return only 10 entries and
this will increase even more using larger offsets. Furthermore, it shows that now 10'010
rows have to be considered for this query.
For users that manually paginate through a result set this performance difference might not be noticed directly.
However, for APIs that might paginate through many pages this can become a problem. Also, there are the other inconsistencies
that arise during OFFSET
pagination so better to not use it.
As a comparison here is the query using the better method called seek/keyset pagination for an offset of 10'000
.
Here we pass the last seen element (with id 10’000) instead of using the offset.
test_offset# \set last_seen 10000
test_offset# EXPLAIN ANALYZE SELECT id FROM users WHERE id > :last_seen ORDER BY id ASC LIMIT 10;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Limit (cost=0.29..0.58 rows=10 width=8) (actual time=0.102..0.108 rows=10 loops=1)
-> Index Only Scan using users_pkey on users (cost=0.29..2574.44 rows=90180 width=8) (actual time=0.100..0.103 rows=10 loops=1)
Index Cond: (id > 10000)
Heap Fetches: 0
Planning Time: 0.157 ms
Execution Time: 0.196 ms
(6 rows)
This query still gets the next then entries at offset 10’000 but still only requires 0.196
.
Inconsistencies of offset/rowset pagination
The big problem with the OFFSET
approach is that between queries entries within the table
might be added or deleted from the list of all users concurrently.
This means you might
- return some rows multiple times if entries are added before the current pagination page
- some entries might not be shown if previous entries are deleted while paginating
Initial jump to offset
If a user wants to initially display a specific pagination page this will take longer than consequent fetches with keyset pagination. With keyset pagination you can only fetch the first, last, previous or next page. We do not have the last key anymore to fetch and have to do the same as with offet pagination to first find the nth row. However, the same problem exists with offset pagination so we can still use this method. If efficient jumps to any large page should be necessary it would probably make sense to create a view and cache the page number for specific items for a given ordering.
Considering all this we see that keyset/cursor pagination is to be favored instead of offset-based pagination. If you are using an ORM or any kind of query helper you might also want to check that they implement this properly.