Any application that relies on external data must rely on some kind of serialization mechanism to read and write that data. Many such serialization formats exist, the most popular one being XML. XML also has the benefits of being programming language-agnostic, and also being to some extent self-describing. However, transmitting and processing XML incurs a great deal of overhead.
Google has been using a serialization format called Protocol Buffers for that purpose instead. Today, the company released this tool under an open-source license. According to the project's documentation:
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
Google's Kenton Varda provides additional details in a blog post:
Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format. Best of all, the classes are easy to use: each field has simple "get" and "set" methods, and once you're ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call...
One of Protocol Buffers' major design goals is simplicity. By sticking to a simple lists-and-records model that solves the majority of problems and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated. And, yes, it is very fast – at least an order of magnitude faster than XML.
APIs for reading and writing Protocol Buffers are available for C++, Java, and Python.
What do you think of Protocol Buffers as a data serialization tool? What are your preferred ways to serialize data?
Surprising that the code author states in the response that he's unaware of ASN.1. Even more surprising would be that no one at GOOG pointed this out to him earlier.
Granted ASN.1 is probably too complex for mere mortals ..
Looks pretty interesting-- but maybe not that suprising. The Google documentation itself invites comparisons to IDL.
Since this has a simpler job to do-- data transfer-- than a complete middleware solution, I can see value in it when you want to avoid some complexity. But, basically, the CORBA model is at heart a really good way to do this kind of thing, so I'm not surprised this looks similar in nature.
I've been using ZeroC's Ice middleware ("http://www.zeroc.com") for enterprise business computing, and it performs outstandingly well compared to any of the commonly-used (read: popular and hyped) technologies in that world.
apart from protocols, just looking at the notations, i can't see where this is an obvious advance over JSON. since JSON has hashes (dictionaries) and strings, there is nothing you can't serialize with it (since you can always incorporate type information into a hash structure, and use strings to transport arbitrary byte sequences). JSON has the definite advantage that as soon as it arrives over the wire and gets interpreted within a VM (JavaScript, Python, Perl, PHP, what have you) you have a guarantee it consists of generic nulls, trues, falses, numbers, strings, lists, and hashes---no more, no less. you can then proceed to reconstruct instances of custom classes based on type information optionally laid down in the hashes. the latter thing is the hard part, but its so much easier to do when you base yourself on JSON than on XML.