Java Buzz Forum - Dates in Atom

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Buzz Forum
Dates in Atom

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Bill de hÓra

Posts: 1137
Nickname: dehora
Registered: May, 2003

Bill de hÓra is a technical architect with Propylon

Dates in Atom

Posted: Jun 17, 2008 4:29 PM

This post originated from an RSS feed registered with Java Buzz by Bill de hÓra.
Original Post: Dates in Atom Feed Title: Bill de hÓra Feed URL: http://www.dehora.net/journal/atom.xml Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8	Latest Java Buzz Posts Latest Java Buzz Posts by Bill de hÓra Latest Posts From Bill de hÓra

Atom Feeds and AtomPub collections are time ordered data. I think most people intuitively know that Atom feeds are time ordered data, but perhaps not that they're ordered by update and edit times, or why time is the natural order for atom serviced content even though domain content might have other natural orders that make sense. Since it's not that commonly talked about, I figure it's worth at least one post to explain why.

Dates in Atom

There's a long (torrid) history of datestamping in the Atom standards and more generally feed syndication. When the Atom format was being designed some working group members felt you needed 3 dates - an edit date, a publish date and a creation date. Or maybe an edit, updated and published. Or... you get the idea. And as prior art to Atom Dublin Core had already settled on 3 dates. Anyway, the Atom working group couldn't agree on 3 (really), but we could identify and agree on 2 meaningful dates - updated and published. As a result, Atom Entries must have an updated date, and can have a published date.

Why all the work to naturally order by time? Historically it's because feeds come from blogs, which are diaries, which are lists of entries ordered by date. Today it's increasingly for systems reasons, most importantly, to support cheap synchronisation by clients. What happens is that the combination of atom:id and atom:updated is enough information for clients to synchronise new or updated content - they work from the top of the feed and walk the entries and/or the feed's previous links until they hit the first atom:id/atom:updated pair that matches their local Entry cache - sync over. This lowers overall traffic and data loading costs out of persistent storage.

Dates in AtomPub

AtomPub (RFC5023) added another date. The working group said that AtomPub collections (feeds you can post content to) should be ordered by a date called app:edited. Entries in AtomPub collections should contain one app:edited element, and must not contain more than one.

Ideally this natural ordering would have been be a must level specification, but RFC5023 couldn't mandate the app:edited be universally understood, as that would break Atom's versioning policies which say that new elements are 'foreign markup' and can be optionally processed or must be ignored. In other words no-one can introduce a new must understand datum into Atom (RFC4287) markup and retroactively break the planet's deployed Atom aware systems - not even AtomPub (RFC5023). Unless you are unlucky, app:edited works well, even where the feed itself is latently updated.

[By the way in the "real world" feeds that can act as AtomPub collections will also appear as being ordered by atom:updated, even though app:edited is what the spec says you should expose. Some systems will update on every edit; that's just how they roll.]

Domain gnarliness

The AtomPub spec doesn't say why app:edited exists, but the following example should help explain why.

Not all domain content is naturally time ordered (there's more to digital life than blogging). Address and contact books for example will tend to be sorted and presented to a user by some other key, maybe last name. This is a gnarly case, that came up on the Atom protocol list a while back.

So say my information store has a list of contacts - and a collection resource for managing those contacts. Generally I'm not interested in retrieving things by last edit/update, I want contacts alpha ordered, becuase my client is a useful application that happens to use Atom/AtomPub, not some kind of an entry cache. If I'm using Atom to represent an address book, using atom:updated or ap:edited seems to be the wrong approach for the UI.

The problem is, not ordering collection entries by update time will result in inefficient syncing (syncing is probably use case 2 or 3 for a network address book, hence you tend to see SyncML and address books go hand in hand).

For example if I add new contact with a last name of "Wordsworth", that will go to the back of the feed and not the front, where it can be picked up cheaply on the next sync. The client the edit came from could of course either hold onto the recent additions/edits (essentially acting as a writethrough cache) instead of paging back to "W". But my client got a bit more complicated. And my other HTTP connected devices wanting the newest stuff will need to page all the way back to "W" in the book to sync up. In fact to be sure they'll have to pull the whole book evey time they sync. The approach of stopping at the first matching id/update pair won't work - algorithmically speaking, syncing will always be a worst case.

Eventually something like the following will happen to deal with the UI being slow, or concurrent client refreshes pegging the server. A new "recently added" contacts feed will be added. Or the sort will be extended to allow by-added/by-updated. Either way, it'll be a reinvention of AtomPub's app:edited default sorting. In that case we'll want move the order by last-name feature of the domain/UI into the implementation detail, perhaps by defining some query params that provides the user optimised view of the data (ie the one that makes most sense for the user browsing the content), and keep the time ordered feed as the protocol default.

What's happening is that there are two use cases. One for viewing an address book in an application (sorted by alpha), and another for adding and syncing contacts to it, and probably the server needs to provide different views on the data for each.

Incidently an AtomPub client can work without app:edited sorting (it won't necessarily know the sort order, unless there's a private contract between client and server), but it will be inefficient on update. So it seems to be in the general case, even for a domain like an address book, order by time is the best natural sort for an AtomPub collection.

Backend databases

Most people I think use databases to back web sites and sometimes you'll want to just use the database primary key to sort the entries. Ordering on the pk is great because it's FaF (Fast as ****). And if the database is using autoincrementing keys we'll naturally sort by content creation date. But there are downsides. For example, this technique won't be optimal for updates as they won't be captured in the order-by clause. At the system level it means that clients will have to start paging more data to sync up content, which means more load against the DB. Non-auto-incrementing keys and very possibly split/federated databases won't be support the implicit creation. And a database wipeout potentially loses the order of actual creation (who knows how the data will be reimported and new keys assigned).

atom dates

What this means that RDBMS managed content being served up for feeds or managed using AtomPub (which will over time trend to being most web content) will have multiple date columns. An insert time (generally good for data management anyway) will be very common. But for content management they'll need an updated column that's indexed, to track recent changes. You might have a third published date, and maybe and edit one as well (if you need to distinguish between an update and an edit), but to let AtomPub clients use and manage the data, an updated date seems to be the minimum must have.

Read: Dates in Atom

Previous Topic

Next Topic


	Web Artima.com