For the first time since 1989, I'm attending OOPSLA. Some things have changed, others have not. Here are a couple of reflections...
I'm back at OOPSLA for the first time
since 1989. I stopped going in 1989 because the conference had gotten too big;
it has since expanded, contracted, and now seems to be growing again. It is
somewhat larger than it was in 1989, but not unmanageably so.
The venue is Portland, a very nice city. We are in the Oregon conference
center, which is a bit sterile for my tastes. When I come to a conference like
this, I gain an additional appreciation for the Jini community's decision that
we would not meet in "plastic" places. So Jini community meetings are in
places like Brussels or downtown Chicago (at a hotel, not a conference center)
or a brewery in London. One of the advantages of a smaller community.
I'm here to give a couple of talks. The one I gave today was part of
the company talks track, which are half-hour sessions given to those
companies that have given enough money to be conference sponsors. I guess that
I was expected to give a sort of overview of the projects
in Sun Labs, but neither I nor the
director of the lab thought that would be a good idea. Instead, I took the
opportunity to talk about something that has been bothering me for a long time.
I started off with the thought that object technology has been pretty
successful. When I came to OOPSLA in 1989, the conference was partly a support
group for those of us who thought that this object way of doing things might
be useful. Now objects are just about everywhere-- designs are about objects;
implementations are about objects; industry does objects; academics do
objects. It has been a very successful idea.
What I find interesting is that the one place that objects don't seem to have
taken hold is in distributed systems, where we still organize things around
wire protocols. We've tried objects in everything from active networking to
agent systems to Jini technology. Some people have found the use of objects in
distributed systems persuasive, but the acceptance has been nowhere near that
of objects within the address space. The challenge I presented to those in the
OOPSLA audience was to either explain why objects weren't used in distributed
computing, or what about distributed computing limited the applicability of
the object paradigm.
I had my own theories. In part, I think this is because we have not freed
objects from the expressions of those objects in a particular language-- even
though we talk about objects in Java or Smalltalk or some other language, the
notion of an object in each of these is significantly different. Smalltalk
objects can't talk to Java objects, and vice versa. The Java type system and
the ML type system are very different. So rather than unifying our notion of
object, we have decided that we will translate into bits on the wire and let
the two sides agree to disagree.
I then argued that part of the reason that we have agreed to disagree is that
most of us in the object world don't actually believe what we say when we talk
about objects. Oh, we are willing to talk about objects that are characterized
by interfaces and which consist of the pairing of data and the code that
manipulates that data. But deep down we know that code is code, and data is
data, and that the two are really different. Objects are an interesting and
convenient abstraction, but they aren't the real deep truth.
Of course, bytes aren't the real, deep truth either. But we have all agreed that
bytes are the right level of abstraction, so we don't generally worry about
going any lower. But we can't come to the same agreement about objects. Maybe
this is something that will change when a new group of programmers and
designers who have never known anything but object-oriented progamming takes
over. I'd hate to think that it will take this long, but maybe that is the
Or maybe there is a limit to the use of objects as a design principle. I
actually think there might be, but I don't think it is the address space. But
where the limits are is an interesting question, and I thought it would be
interesting for someone to ask it.
> I then argued that part of the reason that we have agreed > to disagree is that > most of us in the object world don't actually believe what > we say when we talk > about objects. Oh, we are willing to talk about objects > that are characterized > by interfaces and which consist of the pairing of data and > the code that > manipulates that data. But deep down we know that code is > code, and data is > data, and that the two are really different. Objects are > an interesting and > convenient abstraction, but they aren't the real deep > truth. > [...] > Or maybe there is a limit to the use of objects as a > design principle. I > actually think there might be, but I don't think it is the > address space. But > where the limits are is an interesting question, and I > thought it would be > interesting for someone to ask it.
I've given this some thought, as well. Recently, I've been revisiting assembly programming (having got an ARM-based computer, the Iyonix, with an instruction set that is so much nicer than x86), and at that level, code and data are really unified: A word may be interpreted as either an instruction, or as data. It doesn't matter to the processor.
It's similar with a high-level language like Lisp: Code and data are again unified: The program is just a list.
With OO, an object kind of unifies code and data... However, I don't really have any more comments on this part, so this posting was mostly to note these various approaches to code and data unification.
I'm not 100% sold either way but I'm kind of leaning towards OO not being compatible with distributed systems. I think perhaps it's related to the "Fallacies of Distrubuted Computing" - http://weblogs.java.net/jag/Fallacies.html.
OO makes abstracts things out to a high degree hiding all the gritty details of bits and bytes (ideally.) It may be that this is only tenable in an environment that very 'smooth' for the lack of a better word. When I create an Object in a Java program, it's not going to just disappear from memory for no apparent reason. If it did, my whole program will crash and that's OK because it shouldn't happen except in the most rare circumstances. In distributed programming, you can't make any such assumptions and it's no OK to crash just because something that was there a second ago is no longer there now.
> one place that objects don't seem to have taken hold is in distributed systems, > where we still organize things around wire protocols. > ... > In part, I think this is because we have not freed objects from > the expressions of those objects in a particular language.
We have not freed objects from the _implementation_ of those objects in a particular language.
Objects are programming language constructs. The ability of a programming language construct to interact with another programming language construct is created by the compiler; therefore, the scope of an object is limited to the domain of its compiler, which in turn is limited by the compiler user's decision-making authority. That authority rarely extends beyond the individual system.
> So rather than unifying our notion of object, we ... translate into bits > on the wire and let the two sides agree to disagree. > > Objects are an interesting and convenient abstraction, but they aren't > the real deep truth. Of course, bytes aren't the real, deep truth either. > But we have all agreed that bytes are the right level of abstraction, > so we don't generally worry about going any lower.
It has nothing to do with what is "real" or what is the "right level" -- but rather with what diverse systems today have in common. We deal in bytes because virtually all systems today deal in bytes at some level internally --bytes are the greatest common denominator.
I don't know that the universal use of bytes as a low-level abstraction within individual systems was necessitated by the needs of computer users; it may be an accident of history -- perhaps useful computer architectures with no concept of bytes could have been developed. Had that happened, our communications protocols might now be based on bit streams.
So I guess the rule is to go to as low an abstraction level as necessary for commonality, but no lower.
There are technology issues. But another big issue is that these big distributed systems ultimately talk to Relational Databases. And, in my experience, good database programmers are lousy OO programmers. They think in terms of data and procedures, not objects.
Quoting Allen Holub, "It's rare that a good object model will make a good database schema (or vice versa)."
You seem to be turning a blind eye to the other abstraction available for distrubuted computing, functional programming. With a functional paradigm you no longer have to worry about hidden state within objects and how that state may be mutated by some distributed process. Instead, there's a set of inputs to a function and a return value.
I've spent some time working with Jini and I've spent some time with Erlang, one of the functional approaches to distributed computing. And so far, I'd say things are much cleaner and easier to reason about using the functional approach.
My takeaway after reviewing most of the topics is that most of languages are simply implemented at the wrong levels of abstraction. Truly powerful languages are implemented in terms of themselves. This class of languages is profound. Witness the success of Squeak's self implemented VM and copycat projects like PyPy and RubyInRuby. Everything is data - even code. Manipulatable, discoverable, debuggable in terms of themselves. There is vast power here and COLA takes that kind of self referential power even further.
Meanwhile, languages like Java and C# are not implemented in terms of themselves, but are implemented in a relatively large number of inflexible and coarse abstractions that impair their flexibility and evolution. They are dead tongues handed down from alien gods. They are not profound.
They are sterile simulations of OO. Animatronic facsimiles of living systems with all of the limitations that implies.
The real OOPSLA where the big ideas were shared and explored was at the DSL and the tables near the E meeting rooms. It wasn't in the big halls.
> There are technology issues. But another big issue is > that these big distributed systems ultimately talk to > Relational Databases. And, in my experience, good > database programmers are lousy OO programmers. They think > in terms of data and procedures, not objects. > > Quoting Allen Holub, "It's rare that a good object model > will make a good database schema (or vice versa)."
I think this is totally wrong and it is quite obvious for anyone who ever had to deal with information integration issues.
Distributed systems are not just physically distributed over a network of nodes. Distribution also means distribution of responsibility, distribution of control, distribution of goals and purpose, etc. Distributed systems are very often distributed not for scalability reasons but simply because different people/organisations want to do different things in a different organisational context with that same information.
The principle of encapsulation and data hiding is useful if the context can be clearly defined. You design an interface by thinking about what the purpose of a particular type is, how it interacts with other types of objects, etc. You ask, what do I want to do with that kind of thing? But that is in stark contrast to the requirements posed by distributed systems as described above, because you cannot know what other people need to do with the information burried in your wonderfully designed API.
Objects have a dual role. They are gate keepers that define how data can be modified. But they also define a view on that data. And that's the problem. It makes sense in many cases to centralise write access to data, but it doesn't make sense to centralise the definition of views on data. Objects claim that there is exactly one useful view on a particular type of object and that is simply a flawed concept.
So at this point encapsulation and data hiding start to break down. Sure you can use any of the available externalisation/serialisation patterns. But to avoid leakage of the inner structure of the object, the designer of the type itself has to define what is externalised, so it's once again a centralisation of view definition. There is no right way to tell what information is part of one particular object because objects are almost always part of an object graph. So what do you externalise? You make a decision based on the contexts you know, and I can tell you from experience, it's not the incompetence of OO programmers that makes it so hard to use OO APIs for integration purposes, it's the logical impossibility to anticipate all contexts of use. With a RDBMS, the user defines the views, not the designer of the base tables, and that is a logical requirement, not some kind of technical limitation or feature.
OO is strong where the purpose is to define the interactions between a closely knit graph of objects, but it's naive to assume that any kind of design effort will be able to anticipate all contexts of information use. And that's why the data has to be set free. People who work in the field of information integration, application integration, data warehousing, etc are highly paid to fight the kinds of silos that object APIs are. What we really need is unrestricted access to a rich represantation of data and metadata independent of all APIs.
OO people have to accept that not every system is an OLTP system where it's important to restrict access methods. In decision support and BI, we need to analyse data in unanticipated ways and we need a lot of metadata to do that. If that metadata is burried in procedural code, it is of no use. It has to be duplicated elsewhere and that's a lot of effort.
I consider the inextricable mangling of code and data an anti-pattern comparable to the mingling of user interface and business logic.
> With a RDBMS, the user > defines the views, not the designer of the base tables, > and that is a logical requirement, not some kind of > technical limitation or feature. >
The designer of the base tables still has to anticipate the views that may be required. Otherwise you can end up with a fantastically normalised database, but user will struggle and perhaps fail to create the views that they want. You may resolve this by transforming the source data to another form which is easier to query or by redesigning the base tables.
Since each programming language provides different levels of abstraction, each object implementation is different: C++ objects are nothing more than hardcoded records of data and pointers , whereas Ruby objects are maps of fields and methods which is changeable at run-time; there can never be a co-operation between those objects in a distributed environment beyond data exchange, because the purpose of each language is different.
Object-oriented programming is overrated. OOP is nothing more than pattern matching on the type of data: "if the data type is this, then do this, else if the data type is that, then do that, etc". OOP is still procedural programming, and OOP projects exhibit the same problems as non-OOP projects. Real progress will come when languages can manipulate themselves, i.e. where code is data. LISP was a good idea made bad due to its lack of syntax, weird terminology and complex non-standard libraries...but Ruby proves that, when done correctly, the treatment of code as data is invaluable.
> > With a RDBMS, the user > > defines the views, not the designer of the base tables, > > and that is a logical requirement, not some kind of > > technical limitation or feature. > > > > The designer of the base tables still has to anticipate > the views that may be required. Otherwise you can end up > with a fantastically normalised database, but user will > struggle and perhaps fail to create the views that they > want. > You may resolve this by transforming the source data to > another form which is easier to query or by redesigning > the base tables.
Sure, but this is really just a matter of expertise. The provider of the base tables can provide views of his own to simplify certain recurring needs, but that doesn't make it impossible for the user to access the base tables. At the end of the day, with a data representation (not necessarily relational) you get all the information that exists, with APIs you get only what the API designer anticipated you might need.
> I had my own theories. In part, I think this is because we > have not freed > objects from the expressions of those objects in a > particular language-- even > though we talk about objects in Java or Smalltalk or some > other language, the > notion of an object in each of these is significantly > different. Smalltalk > objects can't talk to Java objects, and vice versa.
Jim, I remember you writing about this before and you write about it as if it's impossible and as if it hasn't been done.
> Sure, but this is really just a matter of expertise. The > provider of the base tables can provide views of his own > to simplify certain recurring needs, but that doesn't make > it impossible for the user to access the base tables. At > the end of the day, with a data representation (not > necessarily relational) you get all the information that > exists, with APIs you get only what the API designer > anticipated you might need.
And when you have data in the table that must be modified in very specific ways or should not be depended upon for any reason, how do you handle that?
You write about this as if there is no downside to many independent parties having unfettered access to a systems data.
> > Sure, but this is really just a matter of expertise. > The > > provider of the base tables can provide views of his > own > > to simplify certain recurring needs, but that doesn't > make > > it impossible for the user to access the base tables. > At > > the end of the day, with a data representation (not > > necessarily relational) you get all the information > that > > exists, with APIs you get only what the API designer > > anticipated you might need. > > And when you have data in the table that must be modified > in very specific ways or should not be depended upon for > any reason, how do you handle that? > > You write about this as if there is no downside to many > independent parties having unfettered access to a systems > data.
No, not at all. Of course there is a downside. I said in my first post that centralisation of write access is useful and I think OO is very suitable for that. I take issue with not having a representation of data and metadata that is separate from the operations that work on the data.
Of course there are situations where read access may have to be restricted as well (security,...) but the point is that it should be a deliberate decision to restrict access, not a consequence of bad design. Separation of code and data is just good design, because it allows for both restricted write access AND flexible definitions of new views by the user. Read access cannot be fully anticipated because there are many useful views depending on the context. Write access has to be anticipated because otherwise you get data corruption.
Now, sure you could design object types in a way that allows access to every piece of information, but this runs contrary to OO design principles. You wouldn't want to see accessor methods for each and every field of an object. And yes there are query interfaces on top of objects that allow for ad-hoc views. But they are not in widespread use, probably because it's hard to optimise a system when each object has its own hard coded access paths to its particular piece of data.
And then there is the issue of normalisation. A properly normalised data model is easier to understand than an object model because it is complete and free of redundancies. An object model is not supposed to be either complete or free of redundancies. Most methods in an object model are useless or even dangerous when all you want to do is create new views and transformations for analysis/reporting purposes.
OO APIs are nice for manipulating data, i.e OLTP, but they are a pain for data integration and BI. And since most growth in the software industry is on the BI side of things, API silos are being teared down or expressed in a data oriented way like with SOAP or REST.
Flat View: This topic has 48 replies
on 4 pages