The Artima Developer Community
Sponsored Link

Weblogs Forum
RSS: The Wrong Solution to a Broken Internet

16 replies on 2 pages. Most recent reply: May 15, 2008 10:16 AM by David MacQuigg

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 16 replies on 2 pages [ 1 2 | » ]
Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

RSS: The Wrong Solution to a Broken Internet (View in Weblogs)
Posted: Sep 18, 2007 11:10 AM
Reply to this message Reply
Summary
RSS seems clever close-up, if you ignore the internet traffic increase issues. But if you look at the real problem, RSS is a workaround that just supports the existing problem: anonymity.
Advertisement

When RSS first appeared, I dutifully wrote an RSS feed for the simple weblog system I had created for myself. Almost immediately, my hosting provider contacted me and said that my usage was too high. Everyone had turned their RSS readers up to "hammer" and were checking every second or sooner to see whether I had put up a new posting. Something I didn't do all that often, so every hour would probably have been just fine.

The result was that I punted, moved my weblog over to Artima, and let Bill handle it. He, I believe, uses third-party services to handle the RSS traffic. So this invention required several technologies and third-party businesses. To solve a problem that shouldn't have existed in the first place.

What are you, the consumer, trying to accomplish? You want to be notified when something happens. We have a well-known pattern for that problem. It's called publish-subscribe. The publisher keeps a pointer to the subscriber, and when something happens tells the subscriber about it. Maximally efficient.

Why doesn't it work? Because the internet is anonymous. People can behave badly because nobody knows who they really are, and enough people do behave badly that you can't risk giving out a pointer to yourself. So we don't. Instead, we need RSS where our readers are constantly, stupidly asking, "did it change yet?" "Did it change yet?" "Now has it changed?" "Now?"

This is a really dumb solution, but it's the only way we can retain our privacy, because if we give out anything that can lead back to us then someone will use it to annoy or trick us.

What's the real problem? Anonymity. The same as graffiti; you can get away with something if no one knows who you are, or doesn't care enough. But if someone sees you spray-painting a wall and they say "I see you John Smith! I know your mother and I'm going to tell her what you're doing!" suddenly you're responsible for your bad behavior. You're much less likely to do it.

The internet allows people to be anonymous. That's very often a good thing, because some groups (governments, it usually seems) can target people if they know who they are. So anonymity can be important, and we need special cases for people who actually need to be anonymous.

But in most situations we don't need anonymity, we need the opposite. I don't really want to hear from someone anonymously because it's unlikely -- except in the above situation -- that they are going to offer something of value to me, nor I to them. On weblog discussions, anonymity is worse than useless because it's the anonymous posters or (on this site) the ones that sneak in under an assumed identity that put up blog spam.

Phishing is an anonymous activity, as is spamming. Anybody miss those if they go away?

It's worse than that because we've all been forced to become anonymous so that the phishers and spammers can't get to us.

What's really aggravating about all this is that there appears to have been multiple solutions available for years. You can imagine that it wouldn't be that hard to say that I have to be able to know for sure that an email came from where it said it does (email headers already say where they come from, but the bad guys just use these to lie). A bigger problem is figuring out how to transition our systems to the new way of doing things, but mailer daemons are upgraded on a regular basis so this transition can surely be managed.

For many years there have been "digital identity" conferences. I have heard that there are something like 8 different approaches to solve this problem, and yet it all seems to just sit there.

Maybe someone can explain why nothing seems to be happening about this issue that, more than anything, plagues the internet and threatens to make it unusable. I mean, if you could choose one thing about the net to change, wouldn't it be "no more spam, phishing and general bad behavior that comes from anonymity?"


John Cowan

Posts: 36
Nickname: johnwcowan
Registered: Jul, 2006

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 18, 2007 12:44 PM
Reply to this message Reply
Anonymity is not at all the problem, nor is pushing the solution.

First of all, there's really no such thing as *non*-anonymity on the Web. The only way to get completely convincing evidence of who you are communicating with is to demand a tie-in between a hard (typically government-issued) ID and the electronic identity. I once (and only once) actually encountered this: in order to post to a certain site, you had to snail-mail them a photocopy of your driver's license; they then supposedly snail-mailed back a cryptographically strong identity token for you to use. Needless to say, I never posted there.

Digital certificate companies basically don't help either: they will send anyone who's willing to pay for it a digital identity after the most cursory checks or no checks at all. Well, at least that assures you that the person you talked to last time is the person you talked to this time? Not at all. Everything that exists on any machine connected in any way to the Internet is, in general, wide open to theft by the bad guys. So the fact that someone sends you an email signed with the PGP key you got from a keyserver, or even from the person's own hands, is no assurance at all.

So the first principle of Internet communication is that everyone is essentially anonymous. People can and will behave badly because you don't know who they are -- and you never will.

Now as for the notification problem, there are many reasons to communicate notification by pulling rather than by pushing. For one thing, a push can fail, and then what? The recipient has lost the chance of finding out about the new posting. Pulling puts the onus on the person who wants the new information: if for some reason one pull fails, the next pull will succeed. Making this happen with push involves a complex process of verification, either by trying to pull the notification back (probably against security), or by sending out a unique ID for the push by way of allowing the client to state that he has received it, or a handshaking protocol involving the client pushing back.

Furthermore, push has all the problems of our one existing push application, email (which is pulled in the last stage to bypass some of these). How can you be sure that the pushee's system is on line? (In email, we delegate this to our ISPs.) How can you be sure that the person who signed up to be pushed to, actually wanted that? (In mailing list email, we have to handshake by sending out probes and having people reply to them by another email or by HTTP, a crude sort of web services.) What is worse, if your system security is compromised, a huge database of pushees becomes available to spammers.

No, RSS/Atom is the best available solution to an extremely intractable problem.

--John Cowan (or am I?)

Anthony Williams

Posts: 3
Nickname: anthonyw
Registered: Aug, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 18, 2007 12:57 PM
Reply to this message Reply
In theory, the If-Modified-Since HTTP header should be used with an RSS feed to avoid actually transferring the data if nothing has changed, but that doesn't address the underlying issue, just reduce the wasted bandwidth.

When I first heard about RSS, it was described as a way of readers being notified of news. It is not --- it is just a standard way of retrieving pre-packaged news, with a time stamp, so you know whether it's new or not.

As you say, the problem with publish-subscribe is that in order to subscribe I have to tell the publisher who I am, and they may use the information for nefarious purposes. However, there's nothing to stop people using a dedicated email address for each subscription, just as people often use dedicated email addresses for mailing lists in order to avoid the same problem.

In fact, I don't know what all the hype about RSS is about. Yes, I use an RSS reader for reading blogs but that's because there's so much focus on RSS feeds at the moment, and it's certainly more convenient than manually checking websites.

Hmm. Maybe that's it --- when I surf the web, I'm essentially anonymous (apart from my IP address, and proxies can mask that). RSS is essentially just a structured way of checking web sites, so people don't want to give up the anonymity when reading stuff. Compare this to reading a newspaper --- I go to the news stand, and hand over some money in exchange for the paper, but nobody knows it's me unless I come day after day and the news-stand-person recognizes me (but even then he doesn't necessarily have a name to put to the face). Certainly, the journalist doesn't know who I am.

People like their anonymity. It makes them feel safe.

Deron Meranda

Posts: 2
Nickname: dmeranda
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 18, 2007 1:31 PM
Reply to this message Reply
A more important HTTP header should be the Expires header. This is supposed to inform the user agent when it should check back again for updated content. This would allow the RSS server to indicate how frequently it wants clients to check, and thus can be tuned to the blog frequency, bandwidth, etc. It's still push, but at least the server has more of a say in how hard it gets hit.

And as far letting the client determine if a change actually happened, it would be even better to use ETags than the Last-Modified date.

Unfortunately nothing uses the HTTP headers correctly. RSS was totally ill-conceived for not even considering how to deal with the rate of update requests, especially when it could have at least made use of the facilities already provided by HTTP. It seems that everytime a new protocol is layered on top of HTTP and the designers don't understand it, they are either doomed to re-invent the same features in a more complex manner or they break the ones it has.

Dmitry Cheryasov

Posts: 16
Nickname: dch
Registered: Apr, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 18, 2007 1:50 PM
Reply to this message Reply
> I use an RSS reader for reading blogs but
> that's because there's so much focus on RSS feeds at the
> moment, and it's certainly more convenient than manually
> checking websites.

Not only blogs (which is important). Ability to subscribe to e.g. search results, regularly updated, is quite nice.

Dmitry Cheryasov

Posts: 16
Nickname: dch
Registered: Apr, 2007

Limited identity exists for ages Posted: Sep 18, 2007 3:48 PM
Reply to this message Reply
> I mean, if you could choose one thing about the net to
> change, wouldn't it be "no more spam, phishing
> and general bad behavior that comes from anonymity?"

Certainly, I'd choose anonymity.

There are other ways to handle bad behaviour.

And there are ways to explicitly trade anonymity for better security on a case-by-case basis (e.g. subscribing to online banking services).

Compare this to the sheer size of universal infrastructure needed to ensure everyone's identity, even if 99.999% participants are never trying to cheat it. Also, imagine what havoc would wreak any minor hiccup in such functionality, let alone its going down. I'd expect it to be much harsher than a comparable dysfunction of root DNS servers.

Thus, let interested parties create identity servers that suit their needs. These identities can be 'exportable' if need be. E.g. I have a google identity which is good for email, calendaring, jabber, etc; I have a livejournal identity (openID) which is good on many openID-enabled sited (unfortunately, not here); I have a couple of online banking identities, and so on.

Multiple identities provide extra safety that a centralized identity does not. Should my gmail identity be stolen, most probably my online banking identities would stay intact. Now imagine having your only and universal identity be stolen.

OTOH, managing multiple identities in a convenient way could be a good online business :)

Kim Sand

Posts: 1
Nickname: kimsnarf
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 19, 2007 12:39 AM
Reply to this message Reply
I believe the amount of traffic generated by RSS pulling would be greatly reduced if more people used online RSS readers. The big gain of using such "collective" tools (like Google Reader) is that a given source only has to be pulled once to serve all the subscribers to that source (using the same tool). Instead of every user pulling every source, the users just pull one source (the tool) and the tool (one "user") pulls all the sources in question.

Anthony Williams

Posts: 3
Nickname: anthonyw
Registered: Aug, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 19, 2007 3:54 AM
Reply to this message Reply
> I believe the amount of traffic generated by RSS pulling
> would be greatly reduced if more people used online RSS
> readers. The big gain of using such "collective" tools
> (like Google Reader) is that a given source only has to be
> pulled once to serve all the subscribers to that source
> (using the same tool). Instead of every user pulling every
> source, the users just pull one source (the tool) and the
> tool (one "user") pulls all the sources in question.

This is of benefit to the source, but not particularly of benefit overall --- users still have to check Google Reader to see if they've got new entries to read. However, there's nothing to stop an online RSS reader providing push semantics for its users --- they have to identify themselves to the reader in order to choose which feeds to aggregate, so they could provide enough information in order for the reader to notify them if anything changes, whether that's by email or by a direct network connection to an applet on the user's desktop, or whatever.

Steve Loughran

Posts: 9
Nickname: stevel
Registered: Feb, 2006

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 12:44 AM
Reply to this message Reply
> I believe the amount of traffic generated by RSS pulling
> would be greatly reduced if more people used online RSS
> readers. The big gain of using such "collective" tools
> (like Google Reader) is that a given source only has to be
> pulled once to serve all the subscribers to that source
> (using the same tool). Instead of every user pulling every
> source, the users just pull one source (the tool) and the
> tool (one "user") pulls all the sources in question.


-it doesnt have to be online readers; people like google and bloglines can act as aggregators for client systems too.

-the sites that suffer from lots of hits are those that are popular. For sites with only a few readers, polling works really well as a way of probing for changes. Its the sites with 500K readers that suffer load.

Where anonymity is an issue is in trackbacks; I've turned trackback support off a long time ago on my blog, as too much spam was getting in that way. Its funny; trackbacks are probably the most widely used web service API -an XMLRPC one at that- and their main goal appears to be to let people selling "OEM" software to put links on other people's blogs.

-steve

Marek Pulka

Posts: 1
Nickname: dddelfin
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 7:41 AM
Reply to this message Reply
@Kim
I agree. But I wouldn't call them 'online readers' (Google is both online and offline thanks to G Gears ;). But this Google Reader service which sits in between feed subscriber and feed publisher changes pull to push. Lets say we have n subscribers and m publishers (and every subscriber subscribes to every publisher) with standard direct feed aggregator we will have m*n pulls. With Google Reader service which pulls for us we have only n pulls for the same accuracy + m pushes for every updated content (I think Google Reader service makes one push with several content updates every 10 or so minutes - accuracy is reduced acceptably but network traffic is reduced even further). Every subscriber is not anonymous to Google Reader but stay anonymous to m publishers he subscribes.

Topsy Kretts

Posts: 1
Nickname: topsy
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 11:22 AM
Reply to this message Reply
1. People need to have something to loose before they can be trusted. In real life, this can be something as simple as their reputation. If you have a pseudo-identity on the internet that can be discarded, then you have nothing to loose, unless you have worked hard on building up the pseudo-identity, such as getting lots of positives on Ebay.

2. When posting on the internet, you are irretractably broadcasting to the world including those who are friendly and unfriendly. Simply posting an opinion could attract unfriendly actions from others. It much safer to have millions of people to be disgusted with your pseudo-identity than with your actual identity.

I'll remain anonymous.

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 12:36 PM
Reply to this message Reply
> 1. People need to have something to loose before they
> can be trusted.
In real life, this can be something
> as simple as their reputation. If you have a
> pseudo-identity on the internet that can be discarded,
> then you have nothing to loose, unless you have worked
> hard on building up the pseudo-identity, such as getting
> lots of positives on Ebay.
>
> 2. When posting on the internet, you are irretractably
> broadcasting to the world including those who are friendly
> and unfriendly.
Simply posting an opinion could
> attract unfriendly actions from others. It much safer to
> have millions of people to be disgusted with your
> pseudo-identity than with your actual identity.
>
> I'll remain anonymous.

I'm not suggesting you put your real identity, phone number, medical records etc. in plain view on the internet. What's needed is a verifiable identity, so that, for example, spammers cannot fake where they're broadcasting from. That means that you can determine -- regardless of whoever this person "really" is -- that a message is really originating from a particular place, so that you can decide not to accept any more messages from that place.

As long as you are willing to accept faked information, spammers and the like have free reign. "Anonymity" sounds like a good thing in theory, but in practice a community cannot remain viable if no one is responsible for their behavior.

S. Fanchiotti

Posts: 10
Nickname: impatient
Registered: Nov, 2003

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 1:58 PM
Reply to this message Reply
Quick Question: isn't this a problem that the way old internet newsgroups solver long time ago? [when bandwidth was really hard to get!]

The problem with RSS is not that it is pulled but that it is pulled from one location instead of distributing the data to many different servers that would synchronize with each other. The load in that case would be spread.

But then we would have to get away from the simple HTTP protocol at some point in the process and would require some sort of collaborative protocol somewhere. Too hard for an individual with a web site and little time to set up and maintain. I guess that is what the feed services do and justifies their existence.

Will Hartung

Posts: 1
Nickname: whartung
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 20, 2007 2:30 PM
Reply to this message Reply
> Quick Question: isn't this a problem that the way old
> internet newsgroups solver long time ago? [when bandwidth
> was really hard to get!]
>
> The problem with RSS is not that it is pulled but that it
> is pulled from one location instead of distributing the
> data to many different servers that would synchronize with
> each other. The load in that case would be spread.

Well, there's your simple solution then.

As a small website, what you do is you simply limit who can subscribe to your RSS feed, and you limit the subscribers to being the large RSS aggregators.

You tell your users "If you'd like to subscribe to my RSS feed, head over to Google and subscribe there".

Back in the heyday, NNTP servers were "everywhere". Now, finding a news feed is a real challenge (notably because of USENET abuse from pirates).

Even when common, though, most folks didn't have lots of news feeds, rather a select few.

Now every RSS feed is a news feed, so you have several providers. Many folks have literally dozens of them.

If, however, you as an RSS provider can limit direct access to your RSS feed and promote aggregated access, then the large aggregators can eat the traffic while they ping your site every 10 minutes polling for changes.

And I'd like to think if anyone would have a well behaved RSS client (in terms of honoring HTTP headers and what not), it would be the large providers, as they have the most to gain.

Mind I don't know if this is possible, but it seems to me that it should be doable, and I don't think the large providers care one way or another, so I don't think it's an abuse of their service or infrastructure.

I don't use RSS, but this doesn't seem like a crippling burden to put on someone interested in your website.

David Clark

Posts: 3
Nickname: davidclark
Registered: Sep, 2007

Re: RSS: The Wrong Solution to a Broken Internet Posted: Sep 25, 2007 9:59 AM
Reply to this message Reply
RSS has nothing to do with anonymity. It has everything to do with a huge percentage of internet users sitting behind a NAT enabled router for 1) better security and 2) conservation of ip addresses. All push solutions will fail because of this. Hence the only solution that will work for 100% of internet traffic is pull based solutions.

Flat View: This topic has 16 replies on 2 pages [ 1  2 | » ]
Topic: RSS: The Wrong Solution to a Broken Internet Previous Topic   Next Topic Topic: JavaOne 2008, Day Four: Wrap Up and Awards

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use