Sponsored Link •
Almost every program we write today will execute in a concurrent computing environment. But to what degree do developers really have to be aware that their programs run on concurrent hardware?
In a recent pair of SD Times articles, Alan Zeichick and Larry O'Brien explore two sides of a "threading" maturity model. Zeichick (Presenting the Threading Maturity Model) takes an organizational approach, while O'Brien (Following the Maturity Model Thread) advocates the issue from an individual developer's vantage point. Their different approaches notwithstanding, the articles share a common theme: a call for a more principled approach to dealing with the coming age of concurrency, allegedly a sea change as significant as the advent of object-orientation was two decades ago.
Zeichick and O'Brien each evokes the Capability Maturity Model (CMM) developed at Carnegie Mellon's Software Engineering Institute in the mid-1980s, charting a path for organizations to embrace the best software development practices. Starting with a state of tabula rasa, a thread maturity model assumes an increasing awareness of concurrency from a development organization and the individual developer, respectively. Going from the point of having only cursory awareness of concurrency issues, the fullest maturity level assumes that developers are capable of producing code that takes into account the full spectrum of cores available on modern CPUs.
Knowing as much as possible about threading and concurrent programming cannot harm a developer or a project team. Writing as I am this blog post on hardware with a multi-core CPU, the arrival of concurrent computing environments into every area of software can hardly be denied. But Zeichick and O'Brien's articles bring up an interesting question: To what degree do developers really have to be aware of the fact that their programs run on concurrent hardware?
Enterprise developers have been writing for concurrent environments all their professional lives—few application servers limit execution to a single thread when handling incoming requests. Similarly, a developer writing code with a database back-end is working in a highly concurrent environment, too: database servers were the first class of software to explicitly allow multiple concurrent requests (indeed, much of the concurrent programming field, such as the theory of locking, is rooted in the work of early database researchers).
Yet, most enterprise developers would consider themselves in the initial stage of a threading maturity model ala Zeichick and O'Brien: they are aware that threads exists, know that threads are important to learn about and that threads, if not properly used, can cause trouble, but don't think of themselves as concurrency experts.
Nor do they have to: Most developers don't have to explicitly program with concurrency in mind to benefit from highly concurrent environments. Concurrency in lower-level software, such as database servers and application servers, have allowed developers to continue writing essentially serial software—Web application controllers, for instance—many "instances" or threads of which are then executed in parallel by the server framework.
In his ground-breaking work on cluster computing, In Search of Clusters, IBM Distinguished Engineer Greg Pfister called that sort of parallelism Serial Program, Parallel Subsystem, or SPPS, parallelism. SPPS parallelism allows a developer to feed a serial program—a Web controller, a database query, or a data mining algorithm implementation—to a parallel subsystem—such as a Web application server, a database server, or a massively parallel supercomputer—and that parallel subsystem will ensure the maximum concurrency for the serial program.
Pfister, who had worked on huge clusters prior to writing his book, claimed that the vast majority of massively parallel computation was performed in the SPPS manner. One reason for that was the great practicality of SPPS parallelism: The parallel subsystem is likely much smarter about concurrency than most developers would be, and is able to thus take better advantage of available resources. Equally important, developers can keep writing the simplest sequential program that gets the job done, and delegate parallelism to a specialized component.
In the SPPS world, developers don't need much of a threading maturity model. At best, awareness of concurrency suffices, as does trust in the underlying parallel subsystem. Because the underlying subsystem is most likely a black box accessed only through a well-defined interface—few developers would want to hack their database's source code, if that code is available at all—there is no choice but to trust that the parallel subsystem does the right thing in terms of concurrency.
Contrast that with systems that require explicit awareness of concurrency. My favorite example is the Swing threading API, something O'Brien alludes to in his article: Even for the simplest application, you need to be keenly aware of what thread your code executes in, with mistakes leading to amateurish application errors. Yet, even seasoned Swing developers don't always do the right thing: How many developers are aware, for instance, that they should not create and show UI components in a
main() method? Instead, Swing wants developers to have all GUI updates to be scheduled on the event handling thread: even as simple an operation as
textField.setText() must explicitly be pushed onto the event handling queue.
What do we get in return for that tedium? Ajax applications, which for the most part execute in a single thread, don't seem to require developers to know much about threading: you can register listeners, while the browser acts as the parallel subsystem that dispatches requests and notifies a sequential Ajax application of the results. The deceptively simple threading model has nevertheless allowed some fairly sophisticated Ajax applications, with excellent usability. Flex (and Flash), similarly, follows the SPPS model by facilitating event handler registrations from a sequential program. Judging from the vast array of Ajax and Flex/Flash applications, Flex's SPPS model allows highly usable applications.
Contrasting Swing's parallelism with, say, Ajax and Flex, may not be fair, since Swing exposes the full power of the JVM to a developer. But for the vast majority of applications, wouldn't an Ajax-style SPPS model be more convenient? More generally, instead of developers striving for a high level of threading maturity, should we strive for more SPPS-style concurrency?
To be sure, the latest JDK concurrency features already point to the direction of delegating concurrency to an executor framework. But mastering the concurrency APIs is not the same as fully understanding how to design and architect highly concurrent applications. Concurrency left to specialists in an application domain, such as databases or application servers or UI toolkits, is likely a better path to benefitting from the abundance of concurrent hardware than what developers building higher-level applications could achieve. Instead of pursuing a threading maturity model, wouldn't enterprise developers be more effective in relying on such parallel subsystems and continue writing essentially sequential programs?
Zeichick and O'Brien seem to think that high levels of threading wisdom is the way to concurrency bliss. Do you agree with their thesis? Or do you think that delegating concurrency to increasingly sophisticated parallel subsystems, while allowing developers to stay with mainly sequential programs, is the way to the future?
|Frank Sommers is a Senior Editor with Artima Developer. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld. Frank also serves as chief editor of the Web zine ClusterComputing.org, the IEEE Technical Committee on Scalable Computing's newsletter. Prior to that, he edited the Newsletter of the IEEE Task Force on Cluster Computing. Frank is also founder and president of Autospaces, a company dedicated to bringing service-oriented computing to the automotive software market.
Prior to Autospaces, Frank was vice president of technology and chief software architect at a Los Angeles system integration firm. In that capacity, he designed and developed that company's two main products: A financial underwriting system, and an insurance claims management expert system. Before assuming that position, he was a research fellow at the Center for Multiethnic and Transnational Studies at the University of Southern California, where he participated in a geographic information systems (GIS) project mapping the ethnic populations of the world and the diverse demography of southern California. Frank's interests include parallel and distributed computing, data management, programming languages, cluster and grid computing, and the theoretic foundations of computation. He is a member of the ACM and IEEE, and the American Musicological Society.