Sponsored Link •
The notion of recording every piece of useful knowledge known to man has been a constant theme throughout history. The works of Aristotle or Plato are attempts at systematic and comprehensive description and categorization of all useful knowledge of their time. In the 18th century, Diderot and the other editors of the French Encyclopedia had a similar aim. Not surprising, the results span a great number of volumes. As with the Encyclopedia Britannica, no one expects to read such a work in its entirety. And, surely, these efforts had to limit themselves to the essential information, based on what the editors and writers considered useful at the time.
What we consider important today might not be so important in the future, and vice versa. The idea that we should record and make widely available every piece of information that can be captured first surfaced at the conclusion of World War II. At the time, scientists were pondering the fate of the large body of research produced in support of the war effort. Vannevar Bush, director of the Office of Scientific Research and Development, whose task it was to coordinate the work of all American scientists involved in wartime research, summarized the problem in his article "As We May Think," published in the July 1945 issue of the Atlantic Monthly:
The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.
Bush suggests that scientific knowledge, or for that matter any knowledge, is only as valuable as it is shared with others. Therefore, devices should be constructed to record all scientific information.
One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record.
Thereafter, that information will be available through a special type of tool, which every person might possess:
Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
As you probably recognize, the closest thing to "memex" is the Web. The Web is the universal information appliance, and thus, the most likely conduit to supply the large amounts of data needed to enable a "calm," unintrusive computing environment.
According to some estimates, the Web has been growing about tenfold each year. As I'm writing this, the Google search engine indicates that it knows about 1,346,966,000 Webpages. Assume that, on average, a Webpage contains 30 KB of information. Then, Google has access to something like 40 terabytes of data (or about 2.5 times the information stored in the Library of Congress in text format). Two years from now, it will likely be a gateway to more than four petabytes of information. (One terabyte is 1,000 gigabytes, and one petabyte is 1,000 terabytes.)
That's still not much, considering that already in 1999, about 300 petabytes worth of magnetic disk was sold. In "How Much Information Is There in the World?" Michael Lesk suggests that, if absolutely everything was recorded in digital media -- all the books, movies, music, conversations, as well as all human memory -- it would amount to a few thousand petabytes. Considering just the magnetic disks sold over the past few years, there already is enough storage to record everything.
Computer scientist David Gelernter, the inventor of space-based computing, which gave rise to technologies such as JavaSpaces, suggests that individuals and organizations start building their digital histories by saving every useful piece of information in a chronologically arranged "stream." Each item in the stream is fully indexed when it is added, and information can be organized on-the-fly into "substreams," based on content or on meta-information. As new information flows into the stream, it is automatically associated with the appropriate substreams. This idea already found its way into a commercial product, Scopeware. Figure 1 shows a browser-based window into a time-based stream via Scopeware's graphical user interface.
Figure 1: Graphical user interface of Scopeware
Figure 1: Graphical user interface of Scopeware
As organizations and individuals start building their digital "trails" over time, they will likely start relying on the availability of the digitally recorded information. But this information will be so vast that humans won't be able to directly use large portions of it.
This is fundamentally different from the categorization, or indexing, of Webpages. Web search engines, such as Google, aim to categorize the Web based on keywords or categories. But when the results of a search return a list of URLs, we click on those links to peruse the contents of those Webpages. Currently, those creating Webpages still have humans primarily in mind as their audience.
With the possibility of storing everything becoming a reality, the overwhelming portions of the information digitized and made available on the network will not be aimed for human consumption. At any rate, so much information will be available that we would have to either ignore very large portions of it, or utilize tools that let us interact with more of it. The latter implies that the information will be ubiquitously available to these tools, that the tools themselves will be readily available to us, and that they will operate calmly, in the background, to our advantage.
Therefore the adoption and widespread use of service-oriented software architectures, such as Jini, will be fueled, not by marketing hype, or any one company's market dominance, but by the need to access the vast amounts of information on the Web, most of which only machines can process.