Steve Loughran:"Hadoop uses RPC to chat between nodes; everything has custom
serialization and the heartbeats include data -tricks that work on
a LAN. But they have a hard time keeping clients and clusters in
Sync: what makes a workable and efficient protocol for the gigabit
LAN in the datacentre is not appropriate for client-cluster comms,
not when the clients aren't under control, or when they are a
distance away. I'd like to work on a good REST api there -something
like S3FS for storage, a pure REST model for Jobs too."
Me too. This issue (RPC) comes up for me from time to time - how to have a tight call model for what are essentially control messages, across a cluster of servers, without getting into
Binary lockstep (eg fine grained RMI or shunting Python pickles around) and the risk of a cyclic dependency so large you can barely see it.
The inability to examine problems with standard tools like wireshark without a decodec. This applies to any binary wire protocol, including base 64'd XML.
Designing some cross language type system
I'm down to 3 options:
Forget RPC, use HTTP calls to post state
Use XMPP messaging and pub/sub
Use RPC but with JSON as your wire format.
All of which abide by the notion that if you must ship non-documents, then ship using a handful of data structures (list, dict) and a limited number of scalar types (unicode strings, numbers, iso-dates, booleans). In other words JSON is the sweet spot of type driven interop.
Steve Vinoski:"For years we’ve known RPC and its descendants to be fundamentally
flawed, yet many still willingly use the approach. Why? I believe the
reason is simply convenience. Regardless of RPC’s
well-understood problems, many developers continue to go down the
RPC-oriented path because it conveniently fits the abstractions of the
popular general-purpose programming languages they limit themselves to
using. Making a function or method call to a remote or distributed
function, object, or service appear just like any other function or
method call allows such developers to stay within the comfortable
confines of their language. Those who choose this approach essentially
decide that developer convenience and comfort is more important than
dealing with hard distribution issues like latency, concurrency,
reliability, scalability, and partial failure."
All important, but binary on the wire messages and lockstepped upgrades are a massive problem as well. IOW a core practical issue with RPC is sending non-text around.
It's interesting then, that Facebook thrift has gone into the Apache Incubator. It looks sort of like JSON but has so-90s stuff like signed integer types.