In his RailConf keynote, Rails creator David Heinemeier Hansson explained his reasoning for preferring numeric entity IDs in URLs as opposed strings. What's your opinion?
In his RailsConf keynote address,
Ruby on Rails creator David Heinemeier Hansson discussed the issue of whether to
use numbers corresponding to database keys in URLs (as in /people/1) versus using string keys that may have
some semantic meaning to users of the web application (as in /people/bob).
We're expecting you'll want to use auto-incrementing IDs for things. And I expect that will be as
controversial a point as auto-incrementing IDs and saying that that's the way you should do it
was with databases. Because most people have this notion that you should have friendlier URLs.
It is funny how I have not heard too many people spot that obvious comparison.
People say we want to have the title of the blog posting in the URL.
You want real keys? You want those keys that burned on the stake in the database design back in your URLs? Why?
The exact same arguments apply. I fell under the exact same thing.
In basecamp, we used to have something like /client/37signals. Well what would happen when people changed their
firm name? They would now call themselves 38signals. It broke. You have broken URLs. You have exactly the same problems
that you had in the database world. Not very nice.
So what I've come to realize is that it's just not worth it. It's just not worth describing these
comcrete logical things in the URLs because you get the same mess when we tried that in the database world.
The counter argument is that URIs are part of the user interface of a web application, at least if the application
is designed for human consumption. (Hansson's comments at RailsConf were in the context of describing
ideas about ActiveResource, a work-in-progress API intended to make it easy to write web service clients, but
the conventions he described are intended for human users as well.) Jakob Nielsen gives his take on this
issue in URL as UI, suggesting that a usable site requires:
a domain name that is easy to remember and easy to spell
URLs that visualize the site structure
URLs that are "hackable" to allow users to move to higher levels of the information architecture by hacking off the end of the URL
Bob is walking down a street, and he sees an advertisement on the side of a bus:
"For the best Chicago Weather information on the Web, visit http://example.org/weather/Chicago."
Bob goes home and types the URI into his browser, which does indeed display for him a Chicago weather forecast. Bob then
realizes that he'll be visiting Boston, and he guesses that a Boston weather page might be available at a similar
URI: http://example.org/weather/Boston. He types that into his browser and reads the response that comes back.
Bob is using the original URI for more than its intended purpose, which is to identify the Chicago weather page. Instead,
he's inferring from it information about the structure of a Web site that, he guesses, might use a uniform naming convention for
the weather in lots of cities. So, when Bob tries the Boston URI, he has to be prepared for the possibility that his guess will prove
wrong: Web architecture does not guarantee that the retrieved page, if there is one, has the weather for Boston, or indeed that it
contains any weather report at all. Even if it does, there is no assurance that it is current weather, that it is intended for reliable
use by consumers, etc. Bob has seen an advertisement listing just the Chicago URI, and that is the only one that the URI authority
has warranted will be a useful weather report.
Still, the ability to explore the Web informally and experimentally is very valuable, and Web users act on such guesses about URIs
all the time. Many authorities facilitate such flexible use of the Web by assigning URIs in an orderly and predictable manner. Nonetheless,
in the example above, Bob is responsible for determining whether the information returned is indeed what he needs.
Notably, Jakob Nielsen mentions "persistent URLs that don't change," in his list of requirements for usable websites, the very example
that David Heinemeier Hansson mentioned as a problem when using string keys in URLs. The importance of persistent URLs, and suggestions
for how to achieve them in URL design, is given in the W3C document, Cool URIs Don't Change.
It is indeed important to think hard about URL design before publishing a URL, because changing it later is difficult.
But the truth is
that Real URIs Do Change, and the web is flexible enough to accomodate that change. It takes a little more work on the web site developer's part, because you have to
redirect requests from old versions of the URI to the new one. But that does work. If /client/37signals changes its name to 38signals, then
it could have its client page moved to /client/38signals, and /client/37signals could perform a 301 redirect to /client/38signals. In such
a situation, neither the web application nor the /client/37signals URL is broken—but it does make life more complicated for the provider
of the service. Ruby on Rails is very strong on making things simple for the developer, and that's a great thing, but the tradeoff in
this case may be a small subtraction in the usability of the site. It may be worth it, because gaining access to the application sooner
in the first place, which Rails may facilitate, is a big usability plus. But there is a tradeoff being made.
In my interview with Ken Arnold, Perfection and Simplicity, he suggested thinking from the user in
rather than from the implementation out:
It takes a lot more work to understand the other person than it takes to understand you. You might say, "I have these two things together so I will
let the user—the person using the API—manipulate them." But that user didn't say, "Hey, I want to manipulate these two data structures!" Often
the user is saying, "I want to get this result." If users could get a result without manipulating the data structures, they'd be happy as clams. If you
can make it more natural for them to get that result, the fact that you have to go through 10 times as much work to access those data structures is good;
it means you are providing value. Many people are much more likely to think about what they have in hand and what they can do. They think from the
implementation out, instead of thinking from the user in.
What is your opinion on this issue? To what extent is a string key in a URL actually more user-friendly than a numeric one? To what extent
does that usability boost matter? In short, when do you think it is worth the extra effort on the part of developers, and when not?
What's more, lots of things don't change their string names, even though other things do. Most people don't change their names very often if at all. Country and language names mostly remain stable over decades, though there are exceptions.
As I like to point out in this connection, a certain city in Italy has been called "Roma" by the people who live there for the last 2759 years. No arbitrary label can match that for stability -- numbered houses are a few centuries old at most, and nothing else comes close.
Also, the original poster is a bit naive to think that a primary id will be stable. What happens in backups? Or when transitioning from one DB to another (say, you're moving off SQL Server to a real database :-) How would you guarantee that the change worked there? What if the autoid was a different format, other than int?
The thing is, you have to choose an identifier that is unlikely to change frequently. Names work, because (whilst they may change) they don't change that frequently. e-mail addresses less so. But ultimately, a URL with names in is going to be easier for someone to remember -- and therefore use or tell others -- than something with random numbers in.
That's after all why IM and Skype work; you just remember a name, not numbers. (Heck, it goes back to DNS ... www.artima.com is much easier to remember than 188.8.131.52 , even though they both 'mean' the same)
1. Given the URL for one object, for example /people/1 it is easy to guess the URLs for objects created around it: /people/2, etc., and you may not want your competitors to be able to do that.
2. It is easy to write a script with a for loop that fetches every object from your site.
3. "URLs that include an identifier will let you down for three reasons. Witness the frequent hijacking and/or hacking of high-prestige low-digit ICQ ids."
For 1, I think you would not allow someone without authentication credentials to see an object even if they could guess its URL. Though they could probably detect that an object exists because you'd normally give them a not authorized message rather than page not found.
For 2, that's true, but it also isn't that hard to write or configure an existing crawler that downloads every page from your site. That happens to Artima all the time, and it sometimes ends up being a denial of service attack.
I'm curious about 3. I looked up ICQ IDs, because I didn't know what that meant. I found this:
I'm curious why low-digit ICQ ids are high prestige, and how these are hacked.
I had recently replaced Artima's profile page, and in the new version I used the user ID rather than the nickname, because I was going to make nicknames optional. It may not make sense to show a profile page if a user doesn't have a nickname, because I was going to require they get one just before they post content to Artima. But if someone posts once and then leaves and never comes back, I was going to eventually recycle that nickname and let someone else who is more active use it (like after a year or two of non-use). So a nickname isn't permanent, so it wouldn't work as the key in the URL.
That's why I put the ID in there, but now I realize that a better approach might be to generate an opaque token for each user, and use that in the profile URL. We already have an algorithm that generates opaque tokens for mailing list subscription confirmation and such.
Another concern I have is how to implement the mapping of strings--be they user-friendly keys or opaque tokens--to the objects in the database, which I'll want to cache in memory. I haven't done that yet, but I imagine it is solvable. We use Hibernate, and I'm pretty sure its caching works off the primary key, which in our case is always simply a number. But it allows you to have non-numeric keys, of course, so I figured we'd have a separate entity whose sole purpose in life would be to map strings to IDs for that real entity, and its cache could be keyed off of the strings.
DHH's strength is that he looks for the simplest solution that solves the 80% in the 80/20 guideline. His weakness is that he tends to oversimplify and jump to poorly-considered conclusions, as in this case. Notice his example: "What if a company changes its name?" Company name changes are almost universally a disaster, although some idiot is always convinced it's a good idea. URL stands for "Universal Resource Locator," not "identifier that I'll want to change on a regular basis."
The real solution is the messy one: sometimes numerical identifiers will make sense. But only when it's not important that people be able to remember them. For me, I prefer that my URLs mean something, when possible. It makes it easier to edit my own site, for example, and when I cut and paste a URL it gives me a double check. On your site, you could improve the urls and make them more readable, but at some point you'd have to mix in numbers.
From a user's perspective a URL is really a disguised database query and as such this isn't really an either-or decision. Really our web applications should support both naming schemes in all permutations.
That said I prefer not to see numeric IDs in URLs for the simple reason that they're an implementation detail and might break catastrophically if an underlying database ever has to be reconstructed from scratch without knowing what order the data was originally entered in.
I also have my qualms about immutable URLs. There is a good reason for the existence of the HTTP DELETE verb...
The one problem i see with string URLs is what do you do when you have two 37signals clients? do you go the route of /client/37signals and /client/37signals2? Its not much of a problem for company names, as they are usually pretty unique (although not when you come to restaurants and such). But with things like people's names you have a huge chance for collisions. How many John Smiths are out there? Or to take the W3C weather example... what about Portland? there are Portlands in many states, and 2 major ones. The only reason http://example.org/weather/Boston works is because there is only one Boston.
Now sure, we could go the route of disambiguation pages like Wikipedia does, but that seems like a poor solution for most sites since the colliding strings probably belong to different users and one should not be made aware of the other. Another solution would be to do something like put the state in the W3C example url above. So we would have http://example.org/weather/MA/Boston. But I can think of examples where a user might have two resources with the same name (Friday's Records) that both live at http://example.org/someuser/recordsets/fridays-records. Yes, we can always try and add more and more layers of disambiguation to our URLs (such as add the date to the record set example above) but our urls become longer, less readable, and harder for the user to dissect. The user probably doesn't know what date they are looking for in most cases.
All that being said, i'm not sure numeric ids are any better (though they don't have collision issues) for the reasons other people have stated. Just my two cents.