We need a new model for sharing information, one that leverages the Internet in a way that scales.
I think that there are two ways that we locate information in a resource: search and structure. In print books, search has always been a difficult approach, and this has been dramatically improved with the Web and electronic documents. But structure is essential, especially when you don't really know what you're trying to find, or you are learning material for the first time. You could argue that you can use search to discover information in a newsgroup, but as the data increases, it becomes much harder to track down the information because it tends to become very scattered. If it were easy, it would cut way down on repeated questions and make the newsgroup a place that people would stay for much longer, rather than the usual response of fleeing when the noise level gets too high.
Structure is essential because it clumps information together, and presents it in a linear form so that a newcomer can absorb the information in the right order. It is also the lowest-entropy form of information, and thus requires the most effort to produce. Most information on the Internet lacks structure, and thus is difficult to use. For a number of years I have been pondering whether it would be possible to create a self-structuring way of sharing information -- one that would naturally tend towards decreasing entropy as people added more to it.
I am hunting for the middle ground. On one extreme is a newsgroup, which has all kinds of very good information but in an essentially linear form. It has very little structure (the best we've seen is "threads" but these are only slightly helpful), and so as we've seen again and again, it doesn't scale. Many people have talked about the phenomenon of a newsgroup being good at the beginning, but when you get too many people it loses its integrity and manageability. Many newsgroups are full of good stuff but I can't handle the volume and the low signal-to-noise ratio so I don't subscribe. If a newsgroup had more structure, however, it could be extremely useful. Information would aggregate in a structured fashion, so when people had questions or new ideas they would go where that information already was. Newbie questions could be answered by navigating a tree, and when a topic came up it would be placed alongside the existing material in the tree, rather than repeating something over and over in a linear fashion, as is done with newsgroups. Some newsgroups work better than others because they have a small group of mature and dedicated people that help keep things on track (comp.lang.python is an example of this). But in general, newsgroups don't scale because they are supposed to be a conversation among a relatively small group of people.
Weblogs scale because they simulate the "eyes forward" style of traditional lecturing. You go to a lecture to hear a particular speaker. With a weblog, lots of people can listen without the system breaking down, and there can be Q&A. The structure of the discussion is around each article, so it's less likely that people get off topic, and you can always just read the article and not the comments (in a newsgroup you never know when you'll find the real essence of the discussion, so you have to read endlessly).
The opposite extreme from the newsgroup is a zine. This requires a lot of effort by experts. Most experts already have too much to do, and can't simply add this task to their list. So in this case, the effort is what doesn't scale.
In pre-internet days, print magazines were the only outlet for less-than-book-sized ideas. Writing for magazines required a lot of hassle and didn't usually pay well, but it paid something and you did it because that was the way to publicize ideas. But with the net, you have many ways to publicize ideas. The weblog requires a lot less effort. "Less effort" is a form of payment. If I want my ideas published, I could do it through a zine but that's going to require a lot more time and effort than if I just publish it through a weblog. Maybe I don't know whether the idea is really worth publishing, and perhaps I just want to try it out. And if I do have an idea worth publishing and am willing to do the work of putting it into article form, why not start thinking of a book? (or, note what Joel Spolsky is doing by collecting the best weblog entries of the year from all over the web, and publishing it as a book -- that's a good idea because he acts as the filter so you don't have to do all the work of finding those pieces yourself).
The internet is busting through the model of "he who owns the presses decides the news." With the Internet, we have alternative incentives. However, the magazine model relies on three different forms of incentive:
You get your ideas out there. The internet keeps coming up with more alternative ways to do this, more easily. So this factor doesn't work with a zine.
You get paid. If you want high-caliber people to put in a lot of work, money is an important incentive.
You get an article reviewed and vetted by said high-caliber people. This is the value/notoriety of association. But without #2, those people aren't there.
That's where the model falls apart -- most high-caliber people are already too busy with their own stuff to take on another task, especially one that involves donating their time, and especially when they have #1 as an outlet. That's why I question the zine approach.
As an alternative, I keep imagining some kind of emergent way to produce structured information, where a lot of people can put in little bits of time, and everyone benefits from the result. I don't know the mechanism for such a thing, but the Internet is what would make it possible. We need to explore the possibilities that the Internet provides, rather than trying to resurrect something from the pre-Internet world.
The wiki clearly has some elements of this. It allows the participants to decide what the organization will be. A wiki has a different feel than a newsgroup because the information is clearly persistent and there is supposed to be structure. However, the wikis I've seen end up suffering because everyone doesn't have the same vision for what that structure should be. Without something that keeps bringing the structure into focus, entropy results and it becomes harder and harder to find what you want. It would be interesting to see what the scaling factor for a wiki is vs. a newsgroup -- how many active participants can a wiki support before it starts losing focus, and what is the same number for a newsgroup.
One approach that shows promise is the wikipedia. The have a number of advantages on that project:
A clear, well-defined goal (create an encyclopedia) with a predefined structure
Many volunteers, and a number of people (if I understand correctly) dedicated full-time to the project
A vast number of consumers -- arguably everyone on the planet -- to justify the effort
Ironically, I think this shows that the wiki is not the ideal medium for a many-person discussion. It requires the tremendous effort seen in the Wikipedia in order to keep the information organized.
Many of the basic ideas of the wiki are valid, however. It allows a person to contribute a very small amount (such as a spelling or grammar correction) or a large amount (such as maintaining the structure of an entire wiki). It has relatively easy entry points. It's possible to create a structured document with automatic table of contents. To maintain the clarity of a wiki, however, requires one or more editors dedicated to the job, who have the vision of what the document should look like.
For a quality document, I think that editing is essential, but my hope is that we can develop a system that would allow one or more of the following:
The computer participates in the editing process (at least the structuring)
The amount of effort by the editor is reduced
The editing process is distributed among some of the volunteers
The system allows the structure to be an emergent property of the document
Although I have made attempts at some of these features (for example, I was largely responsible for the "Backtalk" feature that you see in the online version of the Zope book), I don't really know how all this could be accomplished. But I'm reasonably certain that the model for the "self-organizing book" would be one of the Next Big Things on the Internet.
One thing that comes to mind is self-balancing trees (e.g. red-black trees). However, relying on humans to do the accounting and incremental restructuring of arbitrary information when part of it is modified is unlikely to be successful.
The organization of a balanced tree is very specific, so that function can be automated. The organization of other kinds of information is arbitrary and unpredictable. Unless some kind of structure is defined, it would be difficult to automate the self-organization of information. If a structure were defined, it would likely not meet everyone's requirements. But maybe there exists some structure that would be good enough?
What about web forums? I guess they're mostly like newsgroups, but one difference I see is that you can subscribe to a topic on a web forum (e.g. a topic you started), and receive email notification when someone replies to it.
I don't know that web forums add anything to the discussion, I just wanted to mention them for completeness.
Very interesting topic!!! It should be one of the major concerns at this point in history. Imagine how many interest groups would be able to 'auto'-organize their ideas!
Is it however possible? Or is this idea an utopia? A very interesting document organisation scheme for this envisioned info-sharing model could evolve around the model of structured writing. Take a look at information mapping to get a commercial idea. I allows for easily structured documents without the need of having to write elaborate proza. Documents can be very XML centric and documents can easily be skimmed also!
A couple of links to toss into the mix on this topic.
When Rob Pike was interviewed on /., he had this comment:
"One of the big insights in the last few years, through work by the internet search engines but also tools like Udi Manber's glimpse, is that data with no meaningful structure can still be very powerful if the tools to help you search the data are good. In fact, structure can be bad if the structure you have doesn't fit the problem you're trying to solve today, regardless of how well it fit the problem you were solving yesterday." -- http://developers.slashdot.org/article.pl?sid=04/10/18/1153211
Clay Shirky recently posted an article on this problem as well:
"Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck." -- http://www.shirky.com/writings/ontology_overrated.html
<p>For some time now, I am trying to realize a system that facilitates the implicit sharing of the structures of personal document collections between users. I am regarding documents as unmodifiable atoms, so it does not cover all of Bruce's vision. In short, my approach is to</p> <ul> <li>provide users with a tool to organize their personal document collections,</li> <li>collect the created structures in a central repository, and</li> <li>create a personalized structure of documents relevant to the active user by merging structures created by users most similar to the active user.</li> </ul> <p>I do not assume that the users' goal is to create a common information space, but one that fits their personal way of doing and understanding things best.</p> <p>Now for everyone about to answer with the users-are-lazy-and-will-not-provide-you-much-data argument: I have not given up on them yet...</p>
Essentially, I want to get more structure into the standard threaded discussion forum by allowing users to add "types" to their response : is it a counter-argument, or a question or contributing corrobatory evidence etc?
What I'm hoping is that meaningful typing of relations between postings will help later users find which of dozens of responses to a posting they are most interested in, and this will also encourage less repetition.
Recently I'm wondering whether free-form tagging of responses is a better idea, and probably my next version will explore that direction.
and in particular, http://blumpy.org/tagwebs/, suggests thinking of this newer organizational strategy as a model of human memory. If the problem is information retrieval, then free form association is one good way to find something.
I used to work for a company called Autonomy that specialises in finding solutions to exactly this type of problem. The company was always trying to get over the point that Autonomy was not "search" but information delivery (or something like that anyway.)
Basically, the idea was that all documents get analysed and put in a database. Based on the analysis, models of features within the document are produced that give an indication of the actual contextual meaning of the content of the document. This is somewhat akin to the process of generating Hidden Markov models for speech recognition, or (I presume) the algorithms used in image recognition systems to recognise pictures of faces, etc.
So, when Joe Bloggs comes along and wants to find some information, he can ask a simple question in plain English (rather than guessing at keywords), and the system converts that question to the same models produced from analysing the original documents. A simple matching algorithm is then used to extract candidate documents from the database.
This is extremely powerful software: I put in the Encyclopedia Britanica to a database and then typed "Tell me about black flightless birds." It pulled up articles about Penguins. No Dodos or Kiwis (they're brown) - just Penguins. You can then do all manner of clever things like, automatically refining the "search" by watching what documents the user clicks on and adding those document's models to the search criteria. You can also do a "similar documents" list which would relax the search criteria slightly and show you documents about Dodos and Kiwis (they're not black, but they are flightless birds.)
If you want to see this system in action, look at http://news.bbc.co.uk/ When you click on a story, the related news stories down the right hand side are not put there by a site editor, but by Autonomy software picking articles from the Beeb's database.
There's lots more really cool stuff that this sytem can do, like automatically classifying documents and sorting them based on pre-defined topics, analysis of audio streams using speech recognition and classifying those (when I left, they were working on doing the same to video streams too), automatic translation between languages, etc.
The downside is that only Autonomy (as far as I know) has this technology. They are quite an... ahem.... arrogant company because of this and, predictably, the price is through the roof.... Anyone fancy starting an Open Source project...??!