The Artima Developer Community
Sponsored Link

Java Community News
Why Processes Scale Better Than Threads

14 replies on 1 page. Most recent reply: Sep 6, 2006 10:08 PM by Gregg Wonderly

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 14 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Why Processes Scale Better Than Threads Posted: Aug 30, 2006 10:03 AM
Reply to this message Reply
Summary
In a recent blog post, Assaf Arkin compares threads and independent processes, and suggests that most Java developers turn to threads to scale their application, whereas those working with PHP, Ruby, or other LAMP languages, use processes. He argues that processes scale better.
Advertisement

Assaf Arkin's recent blog post, Why Processes Scale Better Than Threads, contrasts the ways in which LAMP developers and Java developers build complex applications:

In the LAMP world, processes are everything. If you want to pull out data from a file, sort it, and e-mail the result, you pipe several programs together. You’re building a solution by assembling processes.

And for more complex tasks you add even more processes. Want to do things on a schedule? Fire them up with cron. Need to improve throughput? Start up a cache process. Monitor uptime? That’s another process for you.

By contrast, Java developers would run just one JVM process, and call into various APIs to accomplish those same tasks:

In Java you don’t scan files with grep, you use a library. You don’t pipe e-mails to sendmail, you use a library. All the features you need are folded into the VM.

Which turned a snappy VM into a huge behemoth that takes a couple of minutes to boot, as it’s setting up libraries, frameworks and containers. You don’t want to startup the JVM more than once.

To accomplish multiple concurrent tasks, Java developers would use threads, not independent processes. Arkin believes that these approaches result in different scalability characteristics of an application:

Multi-threaded developers tend to scale through objects, libraries and frameworks. When you focus on the components around you, you don’t pay much attention to anything outside the sandbox. The level of abstraction is the API.

Multi-process developers scale by assembling programs together, chaining them or running them in parallel. If it’s not in the framework, you look for a program (or combination of) that does what you need. The level of abstraction is the task...

The more independent processes you have, the easier they are to combine into new and interesting uses.

Because processes can easily be distributed across multiple servers, Arkin believes that solutions that center around the multi-process approach scale better horizontally (incorporating more servers), whereas the multithreading solution scales better vertically, and is able to take better advantage of a more powerful server.

Arkin's concluding point is that horizontal scaling—distributing workload across many less powerful servers—can result in more overall scale than distributing load to more threads in a single process on one powerful server. Potentially, the horizontal scaling approach is also more economical.

Do you agree with Arkin's conclusion that the multi-process approach scales better? And, if so, how do you architect Java applications to distribute workload among multiple processes?


Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Why Processes Scale Better Than Threads Posted: Aug 30, 2006 5:43 PM
Reply to this message Reply
I talked about some of the pros and cons related to this model:

http://www.jroller.com/page/cpurdy?entry=fastcgi_not_so_fast

It's certainly not black and white. I think the author has some good points, and I believe he knows a number of use cases in which doing it any other way (than as he suggested) would be painful over-kill.

Peace,

Cameron Purdy
http://www.tangosol.com/

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 2:21 AM
Reply to this message Reply
> <p>To accomplish multiple concurrent tasks, Java
> developers would use threads, not independent processes.
> Arkin believes that these approaches result in different
> scalability characteristics of an application:</p>
>
> <blockquote>
> <p>Multi-threaded developers tend to scale through
> objects, libraries and frameworks. When you focus on the
> components around you, you don’t pay much attention to
> anything outside the sandbox. The level of abstraction is
> the API.</p>

An application server nowadays plays the same role as an O/S: it manages processes.

Essentially all the effort that has gone into application servers simply repeats what has been done in previous decades regarding operating systems.

>
> <p>Multi-process developers scale by assembling programs
> together, chaining them or running them in parallel. If
> it’s not in the framework, you look for a program (or
> combination of) that does what you need. The level of
> abstraction is the task...</p>

But there is no typed interface between tasks, and that is a great problem. Using tasks is like using a dynamically typed programming language: you never know what is going to work, until you execute it.

>
> <p>The more independent processes you have, the easier
> they are to combine into new and interesting uses.</p>
>

Low coupling provides better reuse; that is common sense. There was a discussion a few moons ago here in artima about if frameworks are better than libraries. The outcome was that libraries are better due to lower coupling.

> <p>Arkin's concluding point is that horizontal
> scaling—distributing workload across many less powerful
> servers—can result in more overall scale than distributing
> load to more threads in a single process on one powerful
> server. Potentially, the horizontal scaling approach is
> also more economical.</p>
>

Since processes can be distributed better than threads, it goes without saying that processes scale better.

But what if threads could be distributed as well? that would certainly turn the case in favor of threads.

Eivind Eklund

Posts: 49
Nickname: eeklund2
Registered: Jan, 2006

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 3:50 AM
Reply to this message Reply
> But there is no typed interface between tasks, and that is
> a great problem. Using tasks is like using a dynamically
> typed programming language: you never know what is going
> to work, until you execute it.

Is it a great problem in your actual practical experience?

'cause in mine, it isn't. I have had very occasional problems with tasks created from pipes etc breaking down with OS and software updates because there wasn't a defined API to work with. This is a very very small part of my day to day issues, though, and a much greater problem with APIs - including APIs with type checking.

Eivind.

Mark Thornton

Posts: 275
Nickname: mthornton
Registered: Oct, 2005

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 4:17 AM
Reply to this message Reply
> > But there is no typed interface between tasks, and that
>
> Is it a great problem in your actual practical
> experience?

More an irritation, most frequent problem is variations in separators (e.g. CSV with , or ;, quoting differences). Also with one process writing values (dates or decimals) in a national language form, while the next process expects a different form.

As for processes vs threads, I think many applications will need to use both and the best mix is likely to depend on the operating system(s) involved.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 7:10 AM
Reply to this message Reply
> <p>By contrast, Java developers would run just one JVM
> process, and call into various APIs to accomplish those
> same tasks:</p>

False Dichotomy. There's nothing stopping a Java developer from scaling with multiple processes that are multi-threaded. I just wrote such an application about a month ago.

> <blockquote>
> <p>In Java you don’t scan files with grep, you use a
> library. You don’t pipe e-mails to sendmail, you use a
> library. All the features you need are folded into the
> VM.</p>

This isn't true.

> <p>Which turned a snappy VM into a huge behemoth that
> takes a couple of minutes to boot, as it’s setting up
> libraries, frameworks and containers. You don’t want to
> startup the JVM more than once.</p>
> </blockquote>
> </p>

I write Java apps all the time that as fast as any other program. This is just nonsense.

> <blockquote>
> <p>Multi-threaded developers tend to scale through
> objects, libraries and frameworks. When you focus on the
> components around you, you don’t pay much attention to
> anything outside the sandbox. The level of abstraction is
> the API.</p>
>
> <p>Multi-process developers scale by assembling programs
> together, chaining them or running them in parallel. If
> it’s not in the framework, you look for a program (or
> combination of) that does what you need. The level of
> abstraction is the task...</p>

He's ignoring the problem of resource sharing and synchronization. These are problems are simple to solve in a single multi-threaded app in Java but require IO in multi-process architectures.

> <p>The more independent processes you have, the easier
> they are to combine into new and interesting uses.</p>

Sorry, but I think this total BS. Compared to well designed OO (which there is admittedly little of) processes are monlithic. How do I reuse just a portion of a process' logic?

> </blockquote>
>
> <p>Because processes can easily be distributed across
> multiple servers, Arkin believes that solutions that
> center around the multi-process approach scale better
> horizontally (incorporating more servers), whereas the
> multithreading solution scales better vertically, and is
> able to take better advantage of a more powerful
> server.</p>

You can Java can do both in Java and there are good reasons to do so. For example, a Java application can run 5 threads on 5 machines. A comparable mutli-process architecture would have 5 processes on 5 machines. The Java process needs only 5 caches. The mutli-process architecture requires 25 if it caches at all. In addition, the Java architecture can actually have a smaller footprint on each machine depending on a number of factors.

> <p>Do you agree with Arkin's conclusion that the
> multi-process approach scales better? And, if so, how do
> you architect Java applications to distribute workload
> among multiple processes?</p>

Sure, in a lot of cases it's much better. If you want high-availablility, you need the multi-process model. One of the easiest ways to do this is to use a messaging architecture such as what any JMS provider sells.

nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 8:47 AM
Reply to this message Reply
For optimum scalability you want both. I am using an ETL tool that works with the dataflow paradigm. The tool breaks every step of the flow into a separate process and then spawns several threads for each process. I would call the pipelining approach depth parallelism and the superscalar approach breath parallelism. Former is easier to use and improves throughput but many times makes latency worse. The later requires that the data for each thread is independent from each other and that is often tricky. Witness also this paper http://www.e.u-tokyo.ac.jp/cirje/research/dp/2006/2006cf397.pdf about the Japanese industry moving some processes from conveyor belt to work-cell assemblies.

Since I went through the fields of electronic engineering, computer science and business management already I will stop now :-).

Todd Blanchard

Posts: 316
Nickname: tblanchard
Registered: May, 2003

Re: Why Processes Scale Better Than Threads Posted: Aug 31, 2006 3:37 PM
Reply to this message Reply
> False Dichotomy. There's nothing stopping a Java
> developer from scaling with multiple processes

But its not typical.

> He's ignoring the problem of resource sharing and
> synchronization. These are problems are simple to solve
> in a single multi-threaded app in Java but require IO in
> multi-process architectures.

These problems are hard - in Java or any other language. There's a reason Erlang has processes (separate memory spaces communicating via queues) rather than threads. Its not at all hard to argue that Erlang's model is vastly superior to Java's for concurrent processing.

Here's a timely article on the topic that was on digg today.
http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover.xml&xsl=article.xsl

> processes are monlithic. How do I reuse just a portion of
> a process' logic?

Your process is too big then. Unix is the counter example.

> For example, a Java application can run
> 5 threads on 5 machines. A comparable mutli-process
> architecture would have 5 processes on 5 machines. The
> Java process needs only 5 caches.

With much more complex cache coordination logic...

> The mutli-process
> architecture requires 25 if it caches at all.

Bogus - caching should be at the service interface level.

> If you want
> high-availablility, you need the multi-process model.

Yep - its generally faster overall - thread context switches aren't free, after all. Process context switches you're going to get anyhow.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Why Processes Scale Better Than Threads Posted: Sep 1, 2006 7:45 AM
Reply to this message Reply
> > False Dichotomy. There's nothing stopping a Java
> > developer from scaling with multiple processes
>
> But its not typical.

That still doesn't explain how Java having the ability to do both makes it inferior.

> > He's ignoring the problem of resource sharing and
> > synchronization. These are problems are simple to
> solve
> > in a single multi-threaded app in Java but require IO
> in
> > multi-process architectures.
>
> These problems are hard - in Java or any other language.
> There's a reason Erlang has processes (separate memory
> y spaces communicating via queues) rather than threads.
> Its not at all hard to argue that Erlang's model is
> s vastly superior to Java's for concurrent processing.

In terms of what? Performace? Usability? In what way?

> Here's a timely article on the topic that was on digg
> today.
> http://www.computer.org/portal/site/computer/menuitem.5d61c
> 1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level
> 1_article&TheCat=1005&path=computer/homepage/0506&file=cove
> r.xml&xsl=article.xsl
>
> > processes are monlithic. How do I reuse just a portion
> of
> > a process' logic?
>
> Your process is too big then. Unix is the counter
> example.

There are unix services that I would like to us pieces of. I understand the concept. It's just that well designed classes more resuable than processes.

> > For example, a Java application can run
> > 5 threads on 5 machines. A comparable mutli-process
> > architecture would have 5 processes on 5 machines. The
> > Java process needs only 5 caches.
>
> With much more complex cache coordination logic...

What's complex about it? I don't even have to think about it when I write my application.

> > The mutli-process
> > architecture requires 25 if it caches at all.
>
> Bogus - caching should be at the service interface level.

OK. I agree that it could be, but is it? Suppose there is a local file used as a resource. In order to avoid loading it into memory for every process, you need to use a shared memory model. Are you suggesting this is eaier than using threads in Java?

> > If you want
> > high-availablility, you need the multi-process model.
>
> Yep - its generally faster overall - thread context
> switches aren't free, after all.

I assume you mean on a single CPU machine.

> Process context switches
> you're going to get anyhow.

Not if you only run one process per machine.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Why Processes Scale Better Than Threads Posted: Sep 1, 2006 8:15 AM
Reply to this message Reply
> Here's a timely article on the topic that was on digg
> today.
> http://www.computer.org/portal/site/computer/menuitem.5d61c
> 1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level
> 1_article&TheCat=1005&path=computer/homepage/0506&file=cove
> r.xml&xsl=article.xsl

His basic argument here is that threading is non-deterministic and therefore too hard.

I love this part:

To offer another analogy, a folk definition of insanity is
to do the same thing over and over again and expect the
results to be different. By this definition, we in fact
require that programmers of multithreaded systems be insane.
Were they sane, they could not understand their programs.


Someone needs to invite Mr. Lee into the real world. First of all, the folk definition is nonsense. There's a whole branch of physics called quantum mechanics that's based in part on this exact expectation. It also implies an extremely ignorant idea that one can expect the same results based on what has been seen before. More to the point, very few real world applications (i.e. anything that deals with IO) is fully deterministic whether it is threaded or not. Lee's use of blantant rhetoric here put's hime on very shaky ground with me. Such arguments are usually used by those with preconcieved notions that they wish to rationalize.

The other thing is that the assumption that running parallel processes solves these problems in itself is also wrong. You still need to deal with communication between processes and resource sharing.

Perhaps if you think that threads are too difficult for you or developers you work with then maybe they are not for you. I don't find them to be all that difficult. Programming with multiple threads is not fundamentally from writing mutliple-processes and they provide many more options.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Why Processes Scale Better Than Threads Posted: Sep 1, 2006 8:23 AM
Reply to this message Reply
> > If you want
> > high-availablility, you need the multi-process model.
>
> Yep - its generally faster overall - thread context
> switches aren't free, after all.

When I said this, I meant that you need more than one machine. Really you need more than one machine at more than one geographic location. I'm not sure if that was clear.

> Process context switches
> you're going to get anyhow.

I think I misinterpreted this before. You mean that the machine the process is running on will have context switches regardless of how many instaces of your app is running. Assuming the CPUs are fewer than processes (and maybe even otherwise) this is granted. But in an IO bound application, multi-process systems should generally have more process-switches than an equivalent multi-threaded application.

Jeff Ratcliff

Posts: 242
Nickname: jr1
Registered: Feb, 2006

Re: Why Processes Scale Better Than Threads Posted: Sep 1, 2006 12:22 PM
Reply to this message Reply
I think a better title for the article would be "Why LAMP is better than Java".

It seems the author is mostly talking about piping data sequentially through a set of standard Linux commands vs. the use of Java threads. This certainly is not the scenario I think of when comparing multi-process programming with multi-threading programming.

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Why Processes Scale Better Than Threads Posted: Sep 4, 2006 9:00 AM
Reply to this message Reply
> I think a better title for the article would be "Why LAMP
> is better than Java".

Or even "Why L is better than Java" ;-)

In some light, the "AMP" part is just a nuisance that he happens to use to string Linux command line invocations together.

All kidding aside, there are _jobs_ that need exactly what he's describing. However, it seems a relatively poor way to build _applications_.

Peace,

Cameron Purdy
http://www.tangosol.com/

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Why Processes Scale Better Than Threads Posted: Sep 4, 2006 9:07 AM
Reply to this message Reply
> Is it a great problem in your actual practical
> experience?
>
> 'cause in mine, it isn't. I have had very occasional
> problems with tasks created from pipes etc breaking down
> with OS and software updates because there wasn't a
> defined API to work with. This is a very very small part
> of my day to day issues, though, and a much greater
> problem with APIs - including APIs with type checking.
>
> Eivind.

My experience is different; in many occasions, errors were silently ignored and only revealed much later.

Gregg Wonderly

Posts: 317
Nickname: greggwon
Registered: Apr, 2003

Re: Why Processes Scale Better Than Threads Posted: Sep 6, 2006 10:08 PM
Reply to this message Reply
> > But there is no typed interface between tasks, and that
> is
> > a great problem. Using tasks is like using a
> dynamically
> > typed programming language: you never know what is
> going
> > to work, until you execute it.
>
> Is it a great problem in your actual practical
> experience?
>
> 'cause in mine, it isn't. I have had very occasional
> problems with tasks created from pipes etc breaking down
> with OS and software updates because there wasn't a
> defined API to work with.

The biggest issue I have with pipes is when something besides the last process in the pipe has a problem. You can have a very hard time trying to find out why a process is not getting any output from a pipeline, especially when it is okay for there to be no output.

Another issue is that I've noticed more recently that

ulimit -c 0

has suddenly become popular, so you won't even see the tale tale sign of a core file for a really poorly behaving process.

Many people can write reliable software. It's the ones that can't, or don't that I worry about using pipes and processes.

Flat View: This topic has 14 replies on 1 page
Topic: Why Processes Scale Better Than Threads Previous Topic   Next Topic Topic: JUnit 4 vs TestNG

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use