The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
The problems with Ruby's serialization (Marshal), and how extprot addresses them

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
The problems with Ruby's serialization (Marshal), and how extprot addresses them Posted: Jan 21, 2009 5:50 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: The problems with Ruby's serialization (Marshal), and how extprot addresses them
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

Chuck Vose's comment made me realize that the universal extprot message decoder can be simplified considerably if I simply deserialize the data and let Ruby do the pretty-printing for me (#inspect). I now have a 120 LoC universal decoder that can deserialize any message (without the original protocol definition), and can exchange data between OCaml and Ruby, the first extprot targets. But before I come to that, some clarifications on Ruby's Marshal vs. extprot are in order.

What's the point? Why use extprot instead of Marshal.dump?

Marshal has been a core Ruby class forever. It is written in C, fairly fast (it's the fastest way to (de)serialize Ruby data, at any rate), and convenient to use: you just give it an object (nearly any object), and you get a string. Give it a string, and your object's back. Why would anybody want to use anything else? In fact, there are a few reasons not to use Marshal:

  1. the format used by Marshal has changed a few times in the past. (The minor version has changed 8 times since the first release.)

  2. it's Ruby-only. AFAIK nothing else can read data serialized with Marshal.

  3. serializing objects with Marshal exposes implementation details.

While (2) means you cannot use it if you care about interoperability, (3) applies also when you're staying in Ruby. A redditor puts it in few words:

It's really infuriating when you (for example) can't send serialized ruby objects over the network because of a 0.0.1 version difference between the 2.

The basic problem with Marshal is that it serializes an object by saving the name of its class and all its instance variables (you can spot them in the generated string):

>> A = Struct.new(:name, :id, :email, :phones)
=> A
>> s = Marshal.dump(A.new("John Doe", 1234, "jdoe@example.com", ["555-4321", 1]))
=> "\004\bS:\006A\t:\tname\"\rJohn Doe:\aidi\002\322\004:\n
    email\"\025jdoe@example.com:\vphones[\a\"\r555-4321i\006"
>> s.size
=> 78

The first, obvious consequence is that you cannot change the name of the class, since Marshal.load needs the class declared in the byte stream to exist. The second, no less apparent, is that you cannot rename the instance variables either. But it gets worse than that: this all means that you're exposing many implementation details (how the data is represented in which instance variables) in the serialized form, details that you will hardly be able to modify, if you want to read old data (worse: ... that you won't be able to change at all if you want old clients to read new data). This can be addressed in an ad-hoc manner by using #marshal_dump and #marshal_load, but this requires extra code and implies that you are no longer able to decode the data if you don't have #marshal_load: effectively, #marshal_dump and #marshal_load define a protocol.

Now that the magic word, protocol, has been uttered, it's time to see if Marshal does anything for us as far a protocol extensibility is concerned. As said above, if anything, Marshal makes interoperability harder, as the encoding is not guaranteed not to change (in practice, it's not expected to change often, but we can't know) and implementation details are leaked by default.

As it turns out, Marshal doesn't help with the sort of backward/forward compatible protocol extensions extprot allows either.

Read more...

Read: The problems with Ruby's serialization (Marshal), and how extprot addresses them

Topic: Quote of the Week: Alexis de Tocqueville Previous Topic   Next Topic Topic: RubyForge now on PostgreSQL 8.3

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use