If object models are tied to languages, then there are some real implicatons to how we do distributed computing (and how we make objects persistent). If this is indeed the case, then why are we so stuck on language-independent systems?
Returning the the subject of my last (substantive) blog...
If object models really are tied to a particular language, they are also
tied to a running process. Objects, taken as a combination of data (at
best hidden and abstracted) and code that manipulates the data
(generally hidden behind an interface), can only exist in a running
process. This is the only place that we can have this connection between
code and data and enforce the abstraction boundaries that are important
If you buy into this, then there are some real implications for those of
us doing distributed computing but want to utilize the mechanisms that
are the foundation of object-oriented programming. In particular, when
we pass information from one process to another, we would like to think
of this as passing an object from one process to another. But that means
that there is going to be some point at which the object is no longer an
object-- it goes out of one process (where it exists as an object), goes
across some transmission fabric (where it is not an object), and has to
be re-constituted as an object on the other side (when it gets back into
This is, not surprisingly, really hard to do. In the simplest case,
where you know that both sides of the transmission are written in the
same language (and thus really share an object model), it is really
hard. Those of you that have used Java RMI-JRMP (or what I think of as
real RMI) know that this is non-trivial. There are some objects that
refer to local state that just can't be passed from one place to another
sensibly (those are the classes that aren't
serializable. Static fields don't make sense to
pass. Making sure that you have the right code when you end up in the
destination process is pretty complicated (everyone has trouble with
If you want to do this in a language-independent fashion, it is even
harder. Now you have no other alternative than deciding on an
object-model for the network, and then a translation function from that
object model to the object models of all of the languages you want to
support. You then have stubs that translate from the object-model of the
sender to the network object model, and skeletons that translate from
the object model of the network to the object model of the
receiver. Since something is always changed in the translation, this
means that there are two stages of change. Two stages where things can
go wrong. Two stages where you have to whack the model.
By the way, the network is not the only place where this problem
occurs. You get the same problem when you try to persist an object. Bits
on a disk are much like bits on the wire, in that neither are
objects. Persistence is just transmission over time, rather than space;
as Einstein taught us these two are pretty much the same.
So, if all of this makes any sense at all, why is it that everyone wants
... > If you want to do this in a language-independent > endent fashion, it is even > harder. Now you have ... Two stages where you have to > whack the the model.
... > So, if all of this makes any sense at all, why is it > is it that everyone wants > language-independent systems?<p>
Good question. Always enjoy reading your posts & articles.
I think it's a question of trade-offs. People will trade the simplicity/purity of a one-language environment for the flexibility/freedom of a multi-language environment, despite the extra wrinkles. If you expose a CORBA or (know you're gonna love this one, based on comments you've made before) XML interface, people have the choice of implementing their part of the system in whatever (supported) language they want.
Much of the time, I think it's just the raw data that really matters, and needs to be persisted/transmitted. I'm a huge fan of object-orientation, abstraction, encapsulation, etc. I'm just willing to accept that I have to go from object to raw data and vice-versa. With a neutral data format, each party in the system can have their independent OO (or other) view of the world, provided they're willing to do this translation. And everyone seems to willing.
That said, I would always advocate sticking to one language when you can - e.g. your company or group controls the whole system. I would only advocate language-independence when you have something like multiple companies using the system - e.g. I think it would be dumb for Amazon to expose a Java API rather than an XML one. Here, I'm talking about distributed systems.
For persistence, I'd advocating just persisting the data and not the greater object. This is because I had a terrible experience with (default Java) serialized objects and the code base evolving out of synch. It would've been MUCH easier had I used something like XML (or plain text even) to store the data, and explicitly marshalled/unmarshalled them.
Not sure if the last 2 paragraphs contradict each other...
"In particular, when we pass information from one process to another, we would like to think of this as passing an object from one process to another."
Why would one prefer this model to the notion of passing a message to some other process, one that may or may not have objects?
One argument or rational for using OOP is that it models, in some plausible abstaction, the real world. This mapping helps one think more clearly and better organize code. But nowhere in real life does one magically clone an object across time and space.
It's just a mind tool, so one is not bound to all literal interpretaions of objects and models, but if a particular variation on a concept is causing a problem, then perhaps the variation or the concept is a poor fit for the task.
Rather than think of sendng objects across the wire, think of the wire as an object, with all the features of encapsultion, etc. Then code doesn't don't care what's on the other end, just what that wire-object returns, and how fast, and how reliably.
I'm in general agreement with the other respondants, but I want to extend the thought about this a bit.
The whole idea of encapsulation is to hide local internal state. If we follow this principle, the rest of the world only needs to care about the observable behaviour of a given object.
So no wonder it is hard to transfer objects from one system to another: the very idea requires that you violate encapsulation! By definition, the encapsulated state is supposed to be local to the implementation on its home system.
If we stick to this "black box" rule of objects, the only thing that should EVER transfer between systems is externalized messages representing the same observable behaviour that passes between different objects within the system.
If you do that, there is no need to have the same language or object model on both distributed sites, and you don't have any impedence mismatches or modelling fuzziness either.
Sometime I think we take the expression "object-oriented" too literally, and forget that it is supposed to be about certain design principles, not just "objects everywhere".
Jim Waldo said, If object models are tied to languages, then there are some real implications
One of the implications not discussed in the original article is its effect on object-relational mapping.
When two different (relational database) data modellers look at a business and design a data model in 3rd normal form, they end up with substantially similar logical models. Jim Waldo's statement implies that using two different OO languages one would end up with two significantly different object models.
Given these facts, we can deduce that it makes more sense to develop the data model and object model independently, and use an ORM tool or some other device to map from a relational data model to the language specific object model. This would give us the greatest benefit since, for the same set of requirements, the data model would not change but the object model does change depending on the language chosen.
Another point that arises from reading the article, and also mentioned in most of the comments, is this - why must we use objects for distributed computing?
I would like to take a step back and ask - What is the task you are trying to accomplish using distributed computing? In short, you expect to pass some information to another server over the network and expect to get back some information in an agreed upon format. This format can be text, XML, objects, ..., but the key is that both the requester of the information and the receiver must agree upon the format.
Jim's focus on using objects on either end of the network reminds us of the old saying, "To a man with a hammer, everything looks like a nail."
Unlike Object Oriented fanatics, I do not believe that OO is the only way to develop software. Like Alex Stepanov of C++ STL fame, I believe that algorithms are more important than any particular viewpoint, such as procedural, OO, functional, etc. For example, when dealing with sorting, the natural consideration is the selection of the correct algorithm for sorting, not the correct Object for sorting. That OO only languages end up putting each sorting algortihm into a class of its own is a limitation of OO language, not a feature of the problem domain.
"As to it being hard, I don't know for sure because I've never done a cross-language object model, but it's certainly doable. MS did it with .NET."
No they did not. .NET supports multiple syntaxes, not multiple languages. They really only have one object model. There are a number of language constructs that are not cleanly representable in the .NET runtime.
> > So, if all of this makes any sense at all, why is it > > is it that everyone wants > > language-independent systems? > > For the same reason people want more than one programming > language. Some tasks are easier in different languages.. > the same reason the whole DSL thing that is ascendant > today. > I can think of two reasons that people want language independent systems. One is that they have lots of different pieces in their system already, and they aren't all in one language. It is very difficult to construct an enterprise on just one language. I made the decision to build Artima out of Java, but then I liked SugarCRM, and that's written in PHP. If I want to connect my ad service with my CRM system, well, I've got to deal with multiple languages. I think that such heterogeity is pretty much impossible to avoid in the real world.
Nevertheless, I think that one of the big misperceptions about Jini is that it assumes Java is everywhere, and therefore isn't a good fit for a heterogenous environment. When in fact, what Jini really does is project services as Java proxies, services that could be implemented in any language. For example, I could easily create a Jini service for SugarCRM that our ad service could use to make updates to the CRM system, even though CRM is PHP. Jini uses Java as a connecting technology, similar to how web services use XML as a connecting technology.
So I think the first reason people prefer language-independent systems is basically that they don't believe a single-language system will work to connect the pieces of a heterogenous environment, and that isn't really true. If the single-language system is just a glue layer, a connecting layer, then it is just as viable as a language-independent system.
The other reason, though, is that people don't want to be married to one vendor. If you decided to use .NET to connect things, you marry yourself to Microsoft. If you use Java, the perception is at least that you marry yourself to Sun. I can get Java technology from many other sources as well, but it is to a great extent coming from Sun. And people don't like to lock themselves in to one vendor.
To Michael's comment about different languages being good for different things, I would just suggest that Java is a good language to use for connecting things on the network. Java is well-suited to that task. For internal systems, where the organization pretty much has control over all parts of the system, I think Java/Jini is a better fit than web services, and that's what I'm planning to use to connect things in Artima's new architecture.
> "As to it being hard, I don't know for sure because I've > never done a cross-language object model, but it's > certainly doable. MS did it with .NET." > > No they did not. .NET supports multiple syntaxes, not > multiple languages. They really only have one object > model. There are a number of language constructs that are > not cleanly representable in the .NET runtime.
Well, syntax is part of language. C# and VB.NET are not the same language. And, thank goodness. If I had to type "MustOverride" in C#, well, let's not go there.
> By the way, the network is not the only place where > where this problem > occurs. You get the same problem when you try to > try to persist an object. Bits > on a disk are much like bits on the wire, in that > n that neither are > objects. Persistence is just transmission over time, > time, rather than space; > as Einstein taught us these two are pretty much the > ch the same. <p> >
Imagine a high-bandwith, near-zero-latency and reliable network (with unique-address-space). Together with an infinity-like storage system (maybe implemented as a global google-like-indexer with garbage collector :-)
Is this hypothetical scenario intrinsically theorical or not? In this ideal scenario, doesn't transmission and persistence become obsolete processes? Are time and space constrains unavoidable? So we are just talking about current (and pretty real) technological imperfections.
Nicolas, when you say, "Together with an infinity-like storage system... " and then also state, "doesn't transmission and persistence become obsolete ", what do you mean by the word storage in the first part quoted?
Do you mean that everything will always be in memory and "live"? That there will be no off-line storage ever?
Interesting idea. What happens to the data when the system/network crashes?
If everyhting is always in memory, how does one application access the data of another application if they are written in different languages? For example, a data warehousing application is different from a transactional business application. Are you proposing that one application convert data from another application on the fly every time one needs data?
> > No they did not. .NET supports multiple syntaxes, not > > multiple languages. They really only have one object > > model. There are a number of language constructs that > are > > not cleanly representable in the .NET runtime. > > Well, syntax is part of language. C# and VB.NET are not > the same language. And, thank goodness. If I had to type > "MustOverride" in C#, well, let's not go there.
Yes, but it is not all of the language. One might very well enable Smalltalk syntax in C++, but with the C++ runtime, the language will never be Smalltalk. The runtime cannot support it.
> Nicolas, when you say, "Together with an infinity-like > storage system... " and then also state, "doesn't > transmission and persistence become obsolete ", what > do you mean by the word storage in the first part quoted? > > Do you mean that everything will always be in memory and > "live"? That there will be no off-line storage ever? > > Interesting idea. What happens to the data when the > the system/network crashes? >
If you maintain multiple copies, the network/system crash needn't matter. This doesn't imply server-side replication which we are used to deploying as the solution to this problem.
Having multiple copies can permit additional flexibility like keeping these copies entirely in memory. Normally you'd have to commit to disk and use that as your backup in case of failure. But, if you have sufficient in-memory copies across sufficient machines you can achieve similar availability/recovery guarentees.
These other approaches can be seen in some of the research work on epidemic computing and the likes of the OceanStore system.
In particular, when > we pass information from one process to another, we > er, we would like to think > of this as passing an object from one process to > ess to another. > From your own admission - an object has both data and code and much of what really gets distributed is actually data (object state) and not code. The situations that actually warrant behavior to be transmitted across the network is but very rare.
So if the problem in distributed computing is relating (read or write) object state to the network stream, then shouln't OO languages have a well defined model (representation) for object state. This model should support reflective, dynamic and uniform access to data. When this is absent you transfer the problem to the application developer and thus end up with vague questions about language-independent systems.
So, OO languages should have a core data representation. All user defined types then get support for extraction to and from this core representation for object state. Once this is available it finds use whereever behavior is dependent on the state and not on the behavior in types.
Specifically network distribution, persistence, and handling stream data formats are problems not related to individual type definitions but rather to the internal state representation in types. When you do not have a model to represent state you cannot describe relationships, constraints and operations on state.
The question then is - how and when should you expose this state representation of objects. Introducing this state abstraction shouldn't change normal domain related programming tasks.
Flat View: This topic has 23 replies
on 2 pages