I have been working for a while on extprot, a tool that allows you to create compact, efficient and extensible binary protocols that can be used for cross-language communication and long-term data serialization. extprot supports protocols with rich, composable types, whose definition can evolve while keeping both forward and backward compatibility.
The protocols created using extprot are:
extensible: types can be extended in several ways without breaking compatibility with existent producers/consumers
self-delimited: each message indicates its own length. This allows you to send sequences of messages (streaming) without having to add message delimiters.
self-describing: a message can be decoded even without the protocol definition. What you get is roughly equivalent to XML without the DTD.
compact: 2 to >6 times less space than XML, typically 2 to 4 times less space than individual, compressed XML messages.
fast: can be deserialized one to two orders of magnitude faster than XML, and faster than it’d take to merely uncompress XML data.
The extprot compiler (extprotc) takes a protocol description and generates code in any of the supported languages to serialize and deserialize the associated data structures. It is accompanied by a runtime library for each target language which is used to read and write the structures defined by the protocol.
At this point, you'll be thinking, "what, yet another Protocol Buffers/Thrift/ASN.1 DER/XDR/IIOP/IFF?"... Not quite: extprot differentiates itself in that it allows for more extensions and supports richer types (mainly tuples and disjoint union types aka. sum types) than Protocol Buffers or Thrift without approaching the complexity of ASN.1 DER. (Note that XDR does not define self-describing protocols, making protocol changes hard at best.)
The improved modifiability and richer data types show in (2) and (3). (1) is similar to Protocol Buffer's, but differs in significant ways.
Example
Here's a trivial protocol definition:
(* this is a comment (* and this a nested comment *) *)
message user = {
id : int;
name : string;
}
The value
{ id = 1; name = "J.R.R. Tolkien" }
is serialized as (bytes in decimal notation plus ASCII characters between quotes) this 21-byte message:
001 019 002 000 002 003 014 "J.R.R Tolkien"
The code generated by extprotc allows you to manipulate such messages as any normal value. For instance, in the Ruby target (in progress as of 2009-01-14), you'd do: