Sponsored Link •
Efforts to make non-Java languages perform well on the JVM accelerated in recent years. The benefits of turning the JVM into a highly-optimized, general-purpose execution environment are many, but so are the challenges.
This week's most talked-about Java news was the decision by Sun to hire the two key figures behind the JRuby project, with the ostensible goal of creating a first-class Ruby implementation on the JVM (see the Artima interview with the JRuby project leads, Sun's JRuby Move). In an email to Artima, Tim Bray, Sun's director of Web technologies, noted that:
Looking back ten years, it might have been really smart, at the birth of Java, to brand the platform and the language separately. But during Java's early years, the technology was hitting such a big sweet spot that it was easy to see the whole thing—VM, libraries, and language—as a single engineering triumph. Microsoft was smart to get out there and evangelize that a virtual machine and API repertoire aren't necessarily tied to a language. On the engineering front, we've been pretty serious about going multi-language for some time now, [for example,] with the work on Rhino and the proposed new dynamic-method-dispatch bytecode.
As Tim Bray alludes to, not only has the number of languages targeting the JVM been increasing, but so has the quality of many of those implementations. Indeed, many languages that target the JVM as an execution environment may find that the JVM can execute code written in a target language better in some cases than natively compiled code can. Here, again, is Tim Bray:
Currently, native Ruby runs mostly in interpreted mode. If we arrange for JRuby to be compiled into Java bytecodes, it'll be running on the JVM, which is one of the world's most heavily-optimized pieces of software. So JRuby might end up having a general performance advantage.
More specifically, Sun is leading the charge toward highly-parallel multicore computing with the T1000 and T2000 "Coolthreads" chips, which are really well-suited to server-side Web apps. The native Ruby implementation of threads is fairly limited and may not take good advantage of this kind of CPU. JRuby uses native Java threads, which are very highly tuned; so in the particular case of highly-threaded parallel code, there's a pretty good [chance] that JRuby will be a performance winner on modern silicon.
In addition to improved performance, another advantage of executing non-Java code on the JVM is that that non-Java code in most cases will be able to take advantage of the huge array of Java class libraries. Thomas Enebo, one of the JRuby project leads, noted that,
Java has a huge corpus of libraries... In some semblance most libraries you can think of have already been implemented in Java, usually as an open source package. JRuby allows Ruby to access any Java class and interact with it as if it was written in Ruby. This means a Ruby programmer has a much larger toolbox at their disposal.
Interaction between Java and non-Java languages inside the JVM works in both ways. When Tor Norbye demonstrated Project Semplice, allowing Visual Basic code to run on the JVM, he also pointed out the benefit of having a JSF page invoke a VB component running inside the JVM:
This compiles the BASIC file down to a Java bytecode class, which is located and instantiated by the JSF managed beans machinery at runtime. As a result, the application works and the JSF framework has no idea it's talking to BASIC code.
Having the ability to execute many types of non-Java application on the JVM can bring a certain level of freedom to developers, since a primarily Java-centric enterprise IT shop may let you code up your app in, say, Ruby on Rails, and then simply run that app inside a highly available, clustered, and possibly even virtualized JVM environment.
With the recent activity around supporting non-Java languages on the JVM, talk about the productivity benefits of dynamically typed languages, and with a naturally occurring fatigue that typically sets in with almost any technology or language with time, it may be an opportune moment for the Java powers-that-be to tweak the JVM and position it as a high-performance, general-purpose execution engine. In other words, it may be time, as Tim Bray said, to "brand the language and the platform separately."
Such a move would pit the JVM as a competitor against Microsoft's Common Language Runtime (CLR). Both platforms could converge to being about execution, and not primarily about language (Microsoft has already positioned the CLR that way). These platforms would then compete in providing a sophisticated array of execution facilities to code written in various languages.
Given that both the JVM and the .NET CLR are Turing-complete, it should be possible to execute any program that runs on the CLR on the JVM as well. However, it is apparent that making many non-Java languages perform well on the JVM—or on the CLR as well—is not a trivial pursuit. That's partly the result of mismatch between key constructs in the JVM and non-Java languages. About implementing Ruby on the JVM, Charles Nutter noted that,
For much of Ruby we've had to implement a "VM on top of a VM" that bridges that gap. We do not have control over the Java stack...so we maintain our own. We do not have a dynamic invocation bytecode in the JVM...so we use our own method. We don't have support for closures...so we simulate them with movable scopes and command implementations. However our recent efforts have aimed toward componentizing these pieces; as the JVM evolves to support them, we'll be able to toss them out one by one.
Thomas Enebo, the other JRuby project lead, added that,
We have to emulate a set of language semantics which do not map well with the JVM's underlying design. Those language semantics are sometimes quirky and reflected some evolutionary set of changes which need to be properly reflected in our implementation. We are getting pretty close to matching parity with the C implementation, but some of the last cases will be a challenge.
In a blog post earlier this year, Non Java Languages on the JVM, Debasish Ghosh summarized some of the technical challenges implementing a language on either the CLR or the JVM, noting that,
There are still confusions regarding what should be parts of the language and what should be supported at the VM level... However, ... it is just a question of the ease of implementation and use and the speed of execution on the VM platforms.
Gosh specifically highlighted four challenging areas in implementing dynamic languages on the JVM. The following are excerpts from his blog post:
- invokedynamic in JVM Dynamically typed languages like Python, Ruby etc. perform method dispatch by name, and not by type - invokedynamic-enabled JVM will ensure "that the verifier won’t insist that the type of the target of the method invocation (the receiver, in Smalltalk speak) be known to support the method being invoked, or that the types of the arguments be known to match the signature of that method. Instead, these checks will be done dynamically."
- Hotswapping The main idea is to allow code changes on the fly, while they are running. The full capability of hotswapping implies any kind of change to be supported, addition/modification/removal of methods and attributes including changes in inheritance hierarchy.
- Tail Calls Functional language programmers use recursion to implement loops - however elegant it may look, recursive calls consume lots of stack space. Hence these languages employ all sorts of optimizations to make efficient loop implementation possible with constant stack space. Tail call optimization is one such technique, which replaces calls in tail position with jump statements. A call is said to be in a tail position if it is the last statement of a function... Language implementers want tail call support in the JVM. This is not as simple as it may seem... Various techniques have been proposed and used in the last decade or so for generic tail call optimization. But none of them have been found to be suitable for an efficient implementation on the JVM.
Continuations Just think of [it] as yet another semantics of function call implementation, where instead of returning value to the calling function, the parent function tells the child function which function to pass the result to. No function call ever returns. The child function does this with an object named Continuation, which it gets from the parent. A continuation is an object which takes a snapshot of the current function's lexicals and control stack - when invoked, the complete state is restored from the calling chain...
I think the biggest challenge of implementing continuations support in the JVM is to follow the principle of "pay for it only if you need it", since not many languages actually need them... Once again, the real challenge is stack management.
Overcoming these technical challenges may be possible with sufficient resources. Indeed, under the hood, some JVM implementations are better prepared to handle some of these features than others are. For instance IBM's JVM seems to be able to handle recursive tail calls fairly well (see the IBM DeveloperWorks article, Improve the performance of your Java code ).
To what extent do you believe the effort to make the JVM a general-purpose runtime is worth the investment? If so, what will that mean to non-JVM implementations of languages, such as Ruby? If both the JVM and Microsoft's CLR evolve in the coming years into truly high-performance, secure execution environments for commonly used languages, what will that dichotomy mean to developers?
|Frank Sommers is a Senior Editor with Artima Developer. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld. Frank also serves as chief editor of the Web zine ClusterComputing.org, the IEEE Technical Committee on Scalable Computing's newsletter. Prior to that, he edited the Newsletter of the IEEE Task Force on Cluster Computing. Frank is also founder and president of Autospaces, a company dedicated to bringing service-oriented computing to the automotive software market.
Prior to Autospaces, Frank was vice president of technology and chief software architect at a Los Angeles system integration firm. In that capacity, he designed and developed that company's two main products: A financial underwriting system, and an insurance claims management expert system. Before assuming that position, he was a research fellow at the Center for Multiethnic and Transnational Studies at the University of Southern California, where he participated in a geographic information systems (GIS) project mapping the ethnic populations of the world and the diverse demography of southern California. Frank's interests include parallel and distributed computing, data management, programming languages, cluster and grid computing, and the theoretic foundations of computation. He is a member of the ACM and IEEE, and the American Musicological Society.