The Artima Developer Community
Sponsored Link

Weblogs Forum
The Great String versus Numeric URL Key Shootout

10 replies on 1 page. Most recent reply: Feb 7, 2007 10:12 AM by Edward Holets

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 10 replies on 1 page
Bill Venners

Posts: 2284
Nickname: bv
Registered: Jan, 2002

The Great String versus Numeric URL Key Shootout (View in Weblogs)
Posted: Jan 23, 2007 7:09 PM
Reply to this message Reply
Summary
In his RailConf keynote, Rails creator David Heinemeier Hansson explained his reasoning for preferring numeric entity IDs in URLs as opposed strings. What's your opinion?
Advertisement

In his RailsConf keynote address, Ruby on Rails creator David Heinemeier Hansson discussed the issue of whether to use numbers corresponding to database keys in URLs (as in /people/1) versus using string keys that may have some semantic meaning to users of the web application (as in /people/bob).

We're expecting you'll want to use auto-incrementing IDs for things. And I expect that will be as controversial a point as auto-incrementing IDs and saying that that's the way you should do it was with databases. Because most people have this notion that you should have friendlier URLs. It is funny how I have not heard too many people spot that obvious comparison.

People say we want to have the title of the blog posting in the URL. You want real keys? You want those keys that burned on the stake in the database design back in your URLs? Why? The exact same arguments apply. I fell under the exact same thing. In basecamp, we used to have something like /client/37signals. Well what would happen when people changed their firm name? They would now call themselves 38signals. It broke. You have broken URLs. You have exactly the same problems that you had in the database world. Not very nice.

So what I've come to realize is that it's just not worth it. It's just not worth describing these comcrete logical things in the URLs because you get the same mess when we tried that in the database world.

The counter argument is that URIs are part of the user interface of a web application, at least if the application is designed for human consumption. (Hansson's comments at RailsConf were in the context of describing ideas about ActiveResource, a work-in-progress API intended to make it easy to write web service clients, but the conventions he described are intended for human users as well.) Jakob Nielsen gives his take on this issue in URL as UI, suggesting that a usable site requires:

  • a domain name that is easy to remember and easy to spell
  • short URLs
  • easy-to-type URLs
  • URLs that visualize the site structure
  • URLs that are "hackable" to allow users to move to higher levels of the information architecture by hacking off the end of the URL
  • persistent URLs that don't change

Similarly, the W3C document, The Use of Metadata in URIs, contains a section of the usability benefit of guessable URLs.

Bob is walking down a street, and he sees an advertisement on the side of a bus:

"For the best Chicago Weather information on the Web, visit http://example.org/weather/Chicago."

Bob goes home and types the URI into his browser, which does indeed display for him a Chicago weather forecast. Bob then realizes that he'll be visiting Boston, and he guesses that a Boston weather page might be available at a similar URI: http://example.org/weather/Boston. He types that into his browser and reads the response that comes back.

Bob is using the original URI for more than its intended purpose, which is to identify the Chicago weather page. Instead, he's inferring from it information about the structure of a Web site that, he guesses, might use a uniform naming convention for the weather in lots of cities. So, when Bob tries the Boston URI, he has to be prepared for the possibility that his guess will prove wrong: Web architecture does not guarantee that the retrieved page, if there is one, has the weather for Boston, or indeed that it contains any weather report at all. Even if it does, there is no assurance that it is current weather, that it is intended for reliable use by consumers, etc. Bob has seen an advertisement listing just the Chicago URI, and that is the only one that the URI authority has warranted will be a useful weather report.

Still, the ability to explore the Web informally and experimentally is very valuable, and Web users act on such guesses about URIs all the time. Many authorities facilitate such flexible use of the Web by assigning URIs in an orderly and predictable manner. Nonetheless, in the example above, Bob is responsible for determining whether the information returned is indeed what he needs.

Notably, Jakob Nielsen mentions "persistent URLs that don't change," in his list of requirements for usable websites, the very example that David Heinemeier Hansson mentioned as a problem when using string keys in URLs. The importance of persistent URLs, and suggestions for how to achieve them in URL design, is given in the W3C document, Cool URIs Don't Change. It is indeed important to think hard about URL design before publishing a URL, because changing it later is difficult.

But the truth is that Real URIs Do Change, and the web is flexible enough to accomodate that change. It takes a little more work on the web site developer's part, because you have to redirect requests from old versions of the URI to the new one. But that does work. If /client/37signals changes its name to 38signals, then it could have its client page moved to /client/38signals, and /client/37signals could perform a 301 redirect to /client/38signals. In such a situation, neither the web application nor the /client/37signals URL is broken—but it does make life more complicated for the provider of the service. Ruby on Rails is very strong on making things simple for the developer, and that's a great thing, but the tradeoff in this case may be a small subtraction in the usability of the site. It may be worth it, because gaining access to the application sooner in the first place, which Rails may facilitate, is a big usability plus. But there is a tradeoff being made.

In my interview with Ken Arnold, Perfection and Simplicity, he suggested thinking from the user in rather than from the implementation out:

It takes a lot more work to understand the other person than it takes to understand you. You might say, "I have these two things together so I will let the user—the person using the API—manipulate them." But that user didn't say, "Hey, I want to manipulate these two data structures!" Often the user is saying, "I want to get this result." If users could get a result without manipulating the data structures, they'd be happy as clams. If you can make it more natural for them to get that result, the fact that you have to go through 10 times as much work to access those data structures is good; it means you are providing value. Many people are much more likely to think about what they have in hand and what they can do. They think from the implementation out, instead of thinking from the user in.

What is your opinion on this issue? To what extent is a string key in a URL actually more user-friendly than a numeric one? To what extent does that usability boost matter? In short, when do you think it is worth the extra effort on the part of developers, and when not?


John Cowan

Posts: 36
Nickname: johnwcowan
Registered: Jul, 2006

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 23, 2007 7:57 PM
Reply to this message Reply
What's more, lots of things don't change their string names, even though other things do. Most people don't change their names very often if at all. Country and language names mostly remain stable over decades, though there are exceptions.

As I like to point out in this connection, a certain city in Italy has been called "Roma" by the people who live there for the last 2759 years. No arbitrary label can match that for stability -- numbered houses are a few centuries old at most, and nothing else comes close.

Alex Blewitt

Posts: 44
Nickname: alblue
Registered: Apr, 2003

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 23, 2007 9:06 PM
Reply to this message Reply
Erm. In the West, at least, the majority of women change their name at least once in their lives (sometimes more than that).

Of course, transparent URI redirection (this page has moved, please update your bookmarks etc.) can be used to solve that kind of problem :-)

Countries do change name (Czechoslovakia for example split) as do cities (Istanbul/Constantinople; Mumbai/Bombay etc), though they don't tend to be frequent; and they're solvable as well.

But can you imagine a Wiki using random keys instead of names?

http://en.wikipedia.org/wiki/42
http://en.wikipedia.org/wiki/The_Answer_to_Life,_the_Universe,_and_Everything

Also, the original poster is a bit naive to think that a primary id will be stable. What happens in backups? Or when transitioning from one DB to another (say, you're moving off SQL Server to a real database :-) How would you guarantee that the change worked there? What if the autoid was a different format, other than int?

The thing is, you have to choose an identifier that is unlikely to change frequently. Names work, because (whilst they may change) they don't change that frequently. e-mail addresses less so. But ultimately, a URL with names in is going to be easier for someone to remember -- and therefore use or tell others -- than something with random numbers in.

That's after all why IM and Skype work; you just remember a name, not numbers. (Heck, it goes back to DNS ... www.artima.com is much easier to remember than 63.246.23.132 , even though they both 'mean' the same)

Alex.

Daniel James

Posts: 1
Nickname: dnljms
Registered: Jan, 2007

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 24, 2007 3:48 AM
Reply to this message Reply
Funny, Joshua Schachter from del.icio.us posted a convincing argument against using autoincremented keys in URLs just the other day.

http://joshua.schachter.org/2007/01/autoincrement.html

Although he's writing about some other issues there, and I suppose suggesting that the choice should be between a sensible name (such as http://del.icious/tag/artima) and a hash of a unique value (http://del.icio.us/url/8dc37d1556ef94eff485fa0887593639) where a short name isn't appropriate.

Sean McCullough

Posts: 3
Nickname: banksean
Registered: Jul, 2004

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 24, 2007 7:42 AM
Reply to this message Reply
Why don't we use UUIDs or GUIDs instead of autoinc. keys in URLs? They're pretty hard to guess and pretty easy to index for quick lookups.

Bill Venners

Posts: 2284
Nickname: bv
Registered: Jan, 2002

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 24, 2007 10:44 AM
Reply to this message Reply
> Funny, Joshua Schachter from del.icio.us posted a
> convincing argument against using autoincremented keys in
> URLs just the other day.
>
> http://joshua.schachter.org/2007/01/autoincrement.html
>
> Although he's writing about some other issues there, and I
> suppose suggesting that the choice should be between a
> sensible name (such as http://del.icio.us/tag/artima) and a
> hash of a unique value
> (http://del.icio.us/url/8dc37d1556ef94eff485fa0887593639)
> where a short name isn't appropriate.
>
That's a great link. Thanks for providing it. He says that URLs that include an identifier will let you down for three reasons:

1. Given the URL for one object, for example /people/1 it is easy to guess the URLs for objects created around it: /people/2, etc., and you may not want your competitors to be able to do that.

2. It is easy to write a script with a for loop that fetches every object from your site.

3. "URLs that include an identifier will let you down for three reasons. Witness the frequent hijacking and/or hacking of high-prestige low-digit ICQ ids."

For 1, I think you would not allow someone without authentication credentials to see an object even if they could guess its URL. Though they could probably detect that an object exists because you'd normally give them a not authorized message rather than page not found.

For 2, that's true, but it also isn't that hard to write or configure an existing crawler that downloads every page from your site. That happens to Artima all the time, and it sometimes ends up being a denial of service attack.

I'm curious about 3. I looked up ICQ IDs, because I didn't know what that meant. I found this:

http://www.linuxsecurity.com/resource_files/documentation/hacking-dict.html#icq

I'm curious why low-digit ICQ ids are high prestige, and how these are hacked.

I had recently replaced Artima's profile page, and in the new version I used the user ID rather than the nickname, because I was going to make nicknames optional. It may not make sense to show a profile page if a user doesn't have a nickname, because I was going to require they get one just before they post content to Artima. But if someone posts once and then leaves and never comes back, I was going to eventually recycle that nickname and let someone else who is more active use it (like after a year or two of non-use). So a nickname isn't permanent, so it wouldn't work as the key in the URL.

That's why I put the ID in there, but now I realize that a better approach might be to generate an opaque token for each user, and use that in the profile URL. We already have an algorithm that generates opaque tokens for mailing list subscription confirmation and such.

Another concern I have is how to implement the mapping of strings--be they user-friendly keys or opaque tokens--to the objects in the database, which I'll want to cache in memory. I haven't done that yet, but I imagine it is solvable. We use Hibernate, and I'm pretty sure its caching works off the primary key, which in our case is always simply a number. But it allows you to have non-numeric keys, of course, so I figured we'd have a separate entity whose sole purpose in life would be to map strings to IDs for that real entity, and its cache could be keyed off of the strings.

Michael Strasser

Posts: 2
Nickname: strasser
Registered: Mar, 2003

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 24, 2007 8:31 PM
Reply to this message Reply
I made exactly that choice just last week and decided on names instead of numeric IDs.

I started with IDs but decided that usernames (also unique) were simpler and more elegant. Usernames can be changed by an administrator but that will be an unusual event.

(It is not a public site so hacking isn't an issue.)

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: The Great String versus Numeric URL Key Shootout Posted: Jan 30, 2007 8:55 AM
Reply to this message Reply
DHH's strength is that he looks for the simplest solution that solves the 80% in the 80/20 guideline. His weakness is that he tends to oversimplify and jump to poorly-considered conclusions, as in this case. Notice his example: "What if a company changes its name?" Company name changes are almost universally a disaster, although some idiot is always convinced it's a good idea. URL stands for "Universal Resource Locator," not "identifier that I'll want to change on a regular basis."

The real solution is the messy one: sometimes numerical identifiers will make sense. But only when it's not important that people be able to remember them. For me, I prefer that my URLs mean something, when possible. It makes it easier to edit my own site, for example, and when I cut and paste a URL it gives me a double check. On your site, you could improve the urls and make them more readable, but at some point you'd have to mix in numbers.

Nicely written post.

Michael Geary

Posts: 1
Nickname: geary
Registered: Sep, 2003

Re: The Great String versus Numeric URL Key Shootout Posted: Feb 3, 2007 1:44 AM
Reply to this message Reply
I would take David's opinion more seriously if he offered his wares at addresses like http://72.32.62.117/.

Eleanor McHugh

Posts: 1
Nickname: feyeleanor
Registered: Nov, 2005

Re: The Great String versus Numeric URL Key Shootout Posted: Feb 5, 2007 6:19 AM
Reply to this message Reply
From a user's perspective a URL is really a disguised database query and as such this isn't really an either-or decision. Really our web applications should support both naming schemes in all permutations.

That said I prefer not to see numeric IDs in URLs for the simple reason that they're an implementation detail and might break catastrophically if an underlying database ever has to be reconstructed from scratch without knowing what order the data was originally entered in.

I also have my qualms about immutable URLs. There is a good reason for the existence of the HTTP DELETE verb...

Edward Holets

Posts: 1
Nickname: holetse
Registered: Feb, 2007

Re: The Great String versus Numeric URL Key Shootout Posted: Feb 7, 2007 10:12 AM
Reply to this message Reply
The one problem i see with string URLs is what do you do when you have two 37signals clients? do you go the route of /client/37signals and /client/37signals2? Its not much of a problem for company names, as they are usually pretty unique (although not when you come to restaurants and such). But with things like people's names you have a huge chance for collisions. How many John Smiths are out there? Or to take the W3C weather example... what about Portland? there are Portlands in many states, and 2 major ones. The only reason http://example.org/weather/Boston works is because there is only one Boston.

Now sure, we could go the route of disambiguation pages like Wikipedia does, but that seems like a poor solution for most sites since the colliding strings probably belong to different users and one should not be made aware of the other. Another solution would be to do something like put the state in the W3C example url above. So we would have http://example.org/weather/MA/Boston. But I can think of examples where a user might have two resources with the same name (Friday's Records) that both live at http://example.org/someuser/recordsets/fridays-records. Yes, we can always try and add more and more layers of disambiguation to our URLs (such as add the date to the record set example above) but our urls become longer, less readable, and harder for the user to dissect. The user probably doesn't know what date they are looking for in most cases.

All that being said, i'm not sure numeric ids are any better (though they don't have collision issues) for the reasons other people have stated. Just my two cents.

Flat View: This topic has 10 replies on 1 page
Topic: The Great String versus Numeric URL Key Shootout Previous Topic   Next Topic Topic: Dr. Jim Gray Missing - You Can Help Find Him

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use