~ ✍️
RSS image

« Back to index

Don't use SQL `OFFSET` for pagination

Mar 27 2023

You might already know about the problems with using SQL OFFSET and LIMIT for pagination, or if you use some modern ORM you might already be protected from this by default. This has been known for a while but I only heard about this recently and you might also have not yet. In addition, at my last workplace nearly all paginated actions were implemented using OFFSET so it might be worth repeating. Furthermore, if someone starts learning SQL it comes naturally that OFFSET would be used for pagination and most learning examples also show such a use case. Therefore this post is a quick reminder to refrain from doing so and why and concerns most database engines such as MYSQL, Postgres, SQLServer. Have a look at this excellent article as well https://use-the-index-luke.com/no-offset).

TLDR: Offset pagination takes increasingly more time when pagination gets larger and leads to inconsistencies.

LIMIT-OFFSET pagination aka rowset pagination

Usually, a paginated query using LIMIT and OFFSET would look something like this:

SELECT id
FROM users
ORDER BY lastname, firstname, id
OFFSET 20
LIMIT 10

This would fetch the next ten entries after having fetched 20 already beforehand. Here we use the id for ordering but you might use a date_created or any other list of fields for ordering your rows. The larger the offset gets, the more time this query will need to fetch the same amount of items which comes as a surprise. Note that LIMIT is not in the ansi-sql standard, you can use FETCH FIRST <n> ROWS ONLY as well.

Keyset / cursor pagination

The better approach is to use the so-called keyset/cursor pagination. Instead of specifying an offset we use the last entry (or some of its properties) within the current pagination window and get all rows that come after the last row considering the ordering. We jump directly to the key using an index and then fetch the next n items. It is important to have an ordering as well as a unique key that will be used for finding the last entry from the current page shown.

If you order by other fields such as lastname, firstname you would include those in the WHERE clause as well. For example: And if you order descending you would just reverse the inequality.

SELECT id
FROM users
WHERE (lastname, firstname, id) > (:last_lastname, :last_firstname, :last_id)
ORDER BY lastname, firstname, id ASC
LIMIT 10

Performance Problems

If you paginate with large OFFSET values all previous rows have to be scanned anyway and are then dropped. This becomes a problem with large OFFSET values as your database engine will do unnecessary work and throw away the data afterward.

Below I show a few queries with a user table containing 100’000 entries. With help of EXPLAIN ANALYZE from Postgres we can see how long the query takes and how many rows are considered. Keep in mind that the actual time for running the query depends heavily on your hardware as well as the current load on my system.

Set up the table and populate it with 100’000 entries:

CREATE TABLE users (id BIGSERIAL PRIMARY KEY);
INSERT INTO users SELECT FROM generate_series(1,100000);

Then we get the first 10 entries with an OFFSET of 0.

test_offset# EXPLAIN ANALYZE SELECT id FROM users ORDER BY id ASC LIMIT 10 OFFSET 0;

                                                              QUERY PLAN                                                               
═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Limit  (cost=0.29..0.55 rows=10 width=8) (actual time=0.090..0.095 rows=10 loops=1)
   ->  Index Only Scan using users_pkey on users  (cost=0.29..2604.29 rows=100000 width=8) (actual time=0.088..0.091 rows=10 loops=1)
         Heap Fetches: 0
 Planning Time: 0.094 ms
 Execution Time: 0.173 ms
(5 rows)

This takes 0.173ms for execution. And only the first 10 rows are considered for this query.

The second query will have an offset of 10'000 entries but still fetch only 10 as the previous query.

test_offset# EXPLAIN ANALYZE SELECT id FROM users ORDER BY id ASC LIMIT 10 OFFSET 10000;

                                                                QUERY PLAN                                                                
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Limit  (cost=260.69..260.95 rows=10 width=8) (actual time=3.572..3.578 rows=10 loops=1)
   ->  Index Only Scan using users_pkey on users  (cost=0.29..2604.29 rows=100000 width=8) (actual time=0.094..2.743 rows=10010 loops=1)
         Heap Fetches: 0
 Planning Time: 0.108 ms
 Execution Time: 3.624 ms
(5 rows)

This now takes 3.624ms instead of 0.173ms even though we only still return only 10 entries and this will increase even more using larger offsets. Furthermore, it shows that now 10'010 rows have to be considered for this query.

For users that manually paginate through a result set this performance difference might not be noticed directly. However, for APIs that might paginate through many pages this can become a problem. Also, there are the other inconsistencies that arise during OFFSET pagination so better to not use it.

As a comparison here is the query using the better method called seek/keyset pagination for an offset of 10'000. Here we pass the last seen element (with id 10’000) instead of using the offset.

test_offset# \set last_seen 10000

test_offset# EXPLAIN ANALYZE SELECT id FROM users WHERE id > :last_seen ORDER BY id ASC LIMIT 10;

                                                              QUERY PLAN                                                              
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Limit  (cost=0.29..0.58 rows=10 width=8) (actual time=0.102..0.108 rows=10 loops=1)
   ->  Index Only Scan using users_pkey on users  (cost=0.29..2574.44 rows=90180 width=8) (actual time=0.100..0.103 rows=10 loops=1)
         Index Cond: (id > 10000)
         Heap Fetches: 0
 Planning Time: 0.157 ms
 Execution Time: 0.196 ms
(6 rows)

This query still gets the next then entries at offset 10’000 but still only requires 0.196.

Inconsistencies of offset/rowset pagination

The big problem with the OFFSET approach is that between queries entries within the table might be added or deleted from the list of all users concurrently. This means you might

  1. return some rows multiple times if entries are added before the current pagination page
  2. some entries might not be shown if previous entries are deleted while paginating

Initial jump to offset

If a user wants to initially display a specific pagination page this will take longer than consequent fetches with keyset pagination. With keyset pagination you can only fetch the first, last, previous or next page. We do not have the last key anymore to fetch and have to do the same as with offet pagination to first find the nth row. However, the same problem exists with offset pagination so we can still use this method. If efficient jumps to any large page should be necessary it would probably make sense to create a view and cache the page number for specific items for a given ordering.

Considering all this we see that keyset/cursor pagination is to be favored instead of offset-based pagination. If you are using an ORM or any kind of query helper you might also want to check that they implement this properly.