Java Buzz Forum - Efficient API Paging: count down, not up

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Buzz Forum
Efficient API Paging: count down, not up

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Bill de hÓra

Posts: 1137
Nickname: dehora
Registered: May, 2003

Bill de hÓra is a technical architect with Propylon

Efficient API Paging: count down, not up

Posted: Jul 20, 2008 4:39 AM

This post originated from an RSS feed registered with Java Buzz by Bill de hÓra.
Original Post: Efficient API Paging: count down, not up Feed Title: Bill de hÓra Feed URL: http://www.dehora.net/journal/atom.xml Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8	Latest Java Buzz Posts Latest Java Buzz Posts by Bill de hÓra Latest Posts From Bill de hÓra

Hueniverse: "Making an API call is like asking a question and if the question is simple enough, the answer is easy to come by. For example, asking to describe a single user is simple. A single database lookup using a single unique key is fast and generally easy to scale. This is the most optimized use case in relational databases. However, asking for the last 20 messages of all the friends I am following is not a simple question. It is complex because it is really a list of questions batched together. To find the last 20 messages, we must ask:

    * Who do I follow?
    * What are their last 20 messages?
    * Of all the messages found, which are the most recent 20?
    * What is the content and other metadata for each of these 20 messages?

This list of questions becomes pretty inefficient with a very large set of friends being followed. However, the data set can be reduced by first finding the top 20 friends with the most recent updates, and only requesting their 20 latest messages. Even with this optimization, we are looking at fetching up to 400 messages in order to display 20. This gets much worse when the request is for the second page of messages (messages 21-40) where we can fetch up to 1600 messages from up to 40 friends in order to display 20 messages. By the 5th page we are fetching up to 10,000 messages to display 20 (which might explain why Twitter temporarily disabled the paging feature of their API)."

Let's ignore the notion of an inbox for now that avoids the who-do-I-follow-means-a-sql-join problem and look at paging over the network. The way some services exposes paging physically ensures each page has to be recomputed every time there's a new addition - new content keeps getting added to page=1, pushing everything else down off the main page causing a ripple effect over the set of pages. As a result, pages easily can't be archived and are tricky to cache - the content of "page=5" changes every time there's new content on the homepage (in Twitter's case new content on the hompage is extrememly frequent), because the paging model is a pushdown stack. Here's that stack:

home page --> page=2 --> ...
    msgN-1      msgN-10
    msgN-2      msgN-11
    msgN-3      msgN-12
    msgN-4      msgN-13
    msgN-5      msgN-14
    msgN-6      msgN-15
    msgN-7      msgN-16
    msgN-8      msgN-17
    msgN-9      msgN-18

Now watch what happens when we add a new message, msgN-0:

home page --> page=2 -->   page=3
    msgN-0        msgN-9      msgN-18
    msgN-1        msgN-10
    msgN-2        msgN-11
    msgN-3        msgN-12
    msgN-4        msgN-13
    msgN-5        msgN-14
    msgN-6        msgN-15
    msgN-7        msgN-16
    msgN-8        msgN-17

All the messages are pushed down and logically, every page in the stack gets touched. As a result nothing is long term cacheable and nothing is physically archivable to disk. It's inherently non-scalable.

Another approach would be to work the paging backwards by counting down. In that model page=1 will be the oldest page not the newest one. Like this:

home page --> page=N --> ... page=1
    msgN-1      msgN-10        msg9
    msgN-2      msgN-11        msg8
    msgN-3      msgN-12        msg7
    msgN-4      msgN-13        msg6
    msgN-5      msgN-14        msg5
    msgN-6      msgN-15        msg4
    msgN-7      msgN-16        msg3
    msgN-8      msgN-17        msg2
    msgN-9      msgN-18        msg1

the homepage is linked to page=N, not page=1. So great, let's add a new msg:

home page --> page=N -->   page=1
    msgN-0        msgN-10       msg9
    msgN-1        msgN-11       msg8
    msgN-2        msgN-12       msg7
    msgN-3        msgN-13       msg6
    msgN-4        msgN-14       msg5
    msgN-5        msgN-15       msg4
    msgN-6        msgN-16       msg3
    msgN-7        msgN-17       msg2
    msgN-8        msgN-18       msg1
    msgN-9

Note that we let the homepage grow by 1. That stopped propagation to the other pages. When the home page gets too big we can push some messages back to a new page and splice it into the list:

home page --> page=N+1 --> page=N -->   page=1
    msgA          msgN-1         msgN-10      msg9
    msgB          msgN-2         msgN-11      msg8
    msgC          msgN-3         msgN-12      msg7
    msgD          msgN-4         msgN-13      msg6
    msgE          msgN-5         msgN-14      msg5
    msgF          msgN-6         msgN-15      msg4
    msgG          msgN-7         msgN-16      msg3
    msgN-0        msgN-8         msgN-17      msg2
                  msgN-9         msgN-18      msg1

Upside? N-2 pages can be archived, even taken off your active database and served as flat files or out a memcached layer (if a database what you're using). If you don't mind a long homepage, you can manage the resizing just on the homepage and splice in a new page when the length of the homepage is 2x your archive page size. Downside? Somebody will file a bug that the main pages vary in size.

By the way, if you designed your Data API around Atom and RFC5005 thus telling everyone "follow the next link" instead of "compute the next link's url, then follow it", you could switch physical paging models after the fact - the user agent doesn't care about your notion of a linked list, it'll key off the "rel" paging attributes. Whereas if you exposed the counting model directly via URL parameters, all clients will need to be upgraded. It's basic Data API practice to tell clients to follow typed links instead of asking them to compute parameterised URLs.

Read: Efficient API Paging: count down, not up

Previous Topic

Next Topic


	Web Artima.com