About Me

Hello, I'm an eclectic software professional working with enterprise software at SAP. This blog only contains my personal views, thoughts and opinions. It is not endorsed by SAP nor does it constitute any official communication of SAP.

Friday, August 1, 2008

Painful SQL Server Pagination

When I used MySQL back in 2000, I took for granted the ability to paginate result sets in the server with the simple LIMIT clause. (Other open source databases, like Firebird and PostgreSQL, have similar constructs.) Querying for a page of the result set is as simple as this:

# Fetches rows 21-30
# 20 is an offset, 10 is the number of records to fetch
select * from Users limit 20, 10;

You should only return from the server the actual records that the application will use, and that normally means a small chunk of records at a time. To me, the ability to easily page through records is a fundamental requirement of a database server.

I had a big surprise when I first used SQL Server 2000, because it didn't have a standard way of paginating records. To paginate on the server, there were all kinds of convoluted nested queries or costly temporary tables. These operations were expensive for the database to process and difficult for the developers to create and maintain. Assuming the Users table has 1000 records, this is one way to do it:

--# Fetches rows 21-30
select top 10 * from 
  (select top 980 * from Users order by UserId desc) as chunk 
order by UserId

The value 980 comes from:

total_users - page_size * (page_number - 1)
       1000 -        10 * (          3 - 1) 

We have to reverse sorting, get the chunk that ends at the first desired record, sort normally, and then get the top desired records. Very ugly.

In SQL Server 2005, things got better with the ROW_NUMBER function. This function can be used to generate a number for each row in a result set, which can be used to slice the result set into the desired page range:

--# Fetches rows 21-30
select * from (
 select row_number() over (order by UserId) as number, * from Users
) as numbered
where number between 21 and 30

Although this is an improvement over the SQL Server 2000 code, it still falls short of the MySQL solution. SQL Server 2008 does not have any improvement in this area, so the SQL Server 2005 code is as good as it gets for now. And this is still bad.

Even Microsoft suffered from this state of affairs. LINQ to SQL (which only works with SQL Server) has the paging methods Skip and Take. The same query as above would be expressed in LINQ (C#) like this:

// Fetches rows 21-30
var userPage = (from user in db.Users select user).Skip(20).Take(10);

This looks nice in LINQ, but generated the same code as above for SQL Server 2005 and the following for SQL Server 2000:

select top 10 [t0].[UserId], [t0].[UserName]
from [dbo].[Users] as [t0]
where not (exists(
  select null as [EMPTY]
  from (
    select top 20 [t1].[UserId]
    from [dbo].[Users] as [t1]
    ) as [t2]
  where [t0].[UserId] = [t2].[UserId]
  ))

For more complex queries, LINQ pagination won't even work with SQL Server 2000.

A possible solution would be to provide a SKIP or OFFSET keyword to be used in conjunction with TOP. A better solution, though, would close the gap between the way we model the problem and the way we solve it:

--# Fetches the third page (each page has 10 rows)
select page 3 size 10 * from Users

With no more page to rows translations, this dream-code shows our true intentions. I guess it would be trivial (well, maybe not) to implement this on SQL Server, and yet the benefits to developers and maintainers would be colossal.

Update: SQL Server 2012 has a solution to this problem:

select * from Users order by UserId 
  offset 20 rows
  fetch next 10 rows only

And so paginating on SQL Server 2012 is not painful anymore.

1 comment: