Weblogs Forum - Why Distributed Computing?

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Why Distributed Computing?

8 replies on 1 page. Most recent reply: Dec 22, 2008 1:55 AM by Danielle Schofield

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 8 replies on 1 page

Jim Waldo

Posts: 34
Nickname: waldo
Registered: Sep, 2002

Why Distributed Computing? (View in Weblogs)

Posted: Mar 31, 2003 5:50 AM

Summary
If processors and networks are getting faster, why do distributed computing at all? If we just wait, servers will be fast enough to do it all. I don't think so, and give some reasons why...

In response to my last posting, Berco Beute asked if faster processors, faster networks, and larger computer capacity allowed all clients to become essentially terminals, and all processing done on the server. Berco was thinking that this might lessen the need for mobile code (which it would), but the stronger conclusion is that this would mean that we really don't need to do distributed computing at all. All our computing can be done in one place, if we are just patient and wait for the machines to get large enough, fast enough, and the networks good enough to allow that sort of concentration.

If this were possible, it would certainly make programming easier...no more messy partial failures to deal with, for example. We get rid of the 7 (or 8) fallacies of distributed computing by simply getting rid of the distributed computing, or at least limiting it to the channel between the client (which becomes essentially a very smart terminal) and the server.

This is the sort of design center that Plan 9 (the Bell Labs system) had. Users would interact via terminals (that looked a lot like Blits) with servers that were stuck away someplace else. This is also a lot like the Sun strategy with SunRays and servers. It does simplify administration, and make programming easier.

But it isn't going to make the need for distributed computing go away. At best, it is a way of putting the problem off for a short period of time; at worst it is just pushing the problem back a level and giving us all an illusion which will bite us soon. The mathematics is simply wrong; looking at the trends reinforces the need for distributed computing.

The trends to look at are those described by Moore's law having to do with processors and the trends in network traffic (not speed). Moore's law, we all know, says that the performance of a processor doubles every 18 months (or that the price is cut in half for the same performance). The trend in network traffic, however, is that it doubles every 12 months (or less). So the increase in network traffic is outpacing the increase in processor performance, at the same time that competent processors are becoming cheaper (and therefore being placed out on the edges of the network cloud). It's just math, folks--the processors can't keep up.

This means that the need for distributed computing is going to increase, not decrease. And part of this need is that more and more different kinds of computing devices, from servers to cell phones to automobiles to refrigerators will be on the network. Humans won't be part of most loops (which is why I worry more about program-to-program distribution) and mobile code is going to be key (an assertion without proof in this log; that will be the subject later).

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Re: Why distributed computing?

Posted: Mar 31, 2003 11:42 AM

There is an interesting paper that supports Jim's view that increase in data flows outpaces the growth in processor capability to process that data. That report is based on experience gleaned from the Sloan Digital Sky Server project, and it establishes the need for distributed data mining (as opposed to mining data located at a single location): They claim that no single data warehouse will contain more than 12% of the world's astronomy research data. Thus, to come up with interesting discoveries, you need to do distributed computing. Similar situations exist in other areas of scientific computing (e.g., particle physics, genetics, drug discovery, etc.)

Web Services for the Virtual Observatory
Alexander S. Szalay, Tam?s Budav?ri, Tanu Malik, Jim Gray, Ani Thakar
(Microsoft Technical Report 2002-85, Aug, 2002)
http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2002-85

Here are a few excerpts:

"Astronomical data is growing at an exponential rate: it is doubling approximately every year. The main reason for this trend is Moore s Law, since our data collection hardware consists of computers and detectors based on the same technology as the CPUs and memory. However, it is interesting to note how the exponential trend emerges. Once a new instrument is built, the size of the detector is frozen and from that point on the data just keeps accumulating at a constant rate. Thus, the exponential growth arises from the continuous construction of new facilities with ever better detectors. New instruments emerge ever more frequently, so the growth of data is faster than just the Moore s Law prediction. Therefore while every instrument produces a steady data stream, there is an ever more complex network of facilities with large output data sets over the world.

"How can we cope with this trend? First of all, since the basic pipelines processing and storage are linearly proportional to the amount of data, the same technology that gives us the larger detectors, will also give us the computers to process and the disks to save the data. On a per project basis the task will probably get easier and easier, the first year will be the most expensive, later it becomes increasingly trivial. On the community level however, the trend is not so clear, as we show below. More and more projects will engage in data intensive projects, and they will have to do much of the data archiving themselves. The integrated costs of hardware and storage over the community will probably increase as time goes on, but only slightly.

(...)

"The exponential growth in the data sources and the exponential growth in the individual data sets put a particular burden on the projects. It only makes sense to spend 6 years to build an instrument, if one is ready to use the instrument for at least the same amount of time. This means, that during the lifetime of a 6-year project, the data growing at a linear rate, the mean time the data spends in the project archive before moving to the centralized facility is about 3 years. Turning this around, the data that makes it into the national facilities will be typically 3 years old. As the amount of data is doubling in the world, every year, in 3 years the data grows by 8-fold, thus no central archive will contain more than about 12% of the world's data at any one time. The vast majority of the data and almost all the current data will be decentralized among the data sources. This is a direct consequence of the patterns of data intensive science. These numbers were of course taken from astronomy, the rates may be different for other areas of science, but the main conclusions remain the same."

Petr Prikryl

Posts: 2
Nickname: petr
Registered: Apr, 2003

Re: Why distributed computing?

Posted: Mar 31, 2003 10:50 PM

I would like to add another point of view (to follow Jim Waldo's and Frank Sommers' reasoning). There is not only the amount of data that leads to the distributed computing. We are searching also for new ways to solutions.

Think about how Object Oriented programming started to change the ways of programming. It brought the possibility to break the monolithic functionality of applications into communicating islands of smaller functionality. The objects also hide the complexity behind them. This in other words means that one can think about the problem more naturally, hierarchically, and that we are able to solve more complex problems.

From hardware point of view, there are some physical limits for the classical processors. The processors have to follow the way of concurency (parallelism). And they already do that. Still, the new processors try to pretend that they are simple (mono) processors, because the programmers require this kind of behaviour.

The truth is that pure Von Neumann architecture is still the basis even of the most modern processors. But they develop towards something that is more natural for applications that are composed of objects. In fact, the object oriented solution can naturally be mapped to a distributed system -- in the ideal case. So the question is not "Why distributed computing?", but rather "How to make it simpler for a programmer?". Recall the time when people struggled for simpler programming in autocode, assembler, higher languages... Think about how higher data abstractions emerged in many languages. The languages, their features, and the used abstractions were developed much earlier, but now you can observe their massive usage in everyday programmer's work -- like the containers and iterators, the list and dictionary data types (just the fragments that came to my mind). They were not widely known in the past. Similarly, new abstractionsthat simplify the building of distributed application will emerge.

I personally believe that distributed programming does not neccessarily be extremely difficult. We only have to find the ways how to do it correctly and reliably. And also the hardware must get matured (and cheap enough) in that sense. In my opinion, many "distributed people" are desperately waiting for new hardware to implement their new ideas.

geoffrey hendrey

Posts: 2
Nickname: geoff
Registered: Apr, 2003