The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Faster deserialization in Ruby, beating Marshal

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Faster deserialization in Ruby, beating Marshal Posted: Jan 29, 2009 7:51 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Faster deserialization in Ruby, beating Marshal
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

I have optimized a bit the universal pure-Ruby extprot deserializer, and written a new one as a C extension. The former now approaches the speed of Marshal.load in my simple tests (quite remarkable because this is Ruby vs. C) and is now down to 70 lines of code. The extension is over 3 times faster than Marshal.load and takes a bit over 200 lines.

Serialization size (bytes) Deserialization time
Marshal.load(from string) (Ruby core method in C) 3527449 1.29s
Marshal.load(from IO) 3527449 1.65s
pure Ruby extprot decoder, from IO 1859128 1.85s
Ruby extprot decoder (extension), from IO 1859128 0.48s

Optimizing dispatching in Ruby

The core of the decoder is a loop that reads the prefix of the next value and then calls the appropriate subroutine to decode it. In the original Ruby decoder, this looked like

def read_value
  prefix = read_prefix
  tag = ll_tag(prefix)
  case ll_type(prefix)
  when :tuple; t = read_htuple(prefix); r = Tuple.new(t); r.tag = t.tag; r
  when :htuple; read_htuple(prefix)
  when :assoc; read_assoc(prefix)
  when :vint; n = read_vint; (n >> 1) ^ -(n & 1)
  when :bits8; @io.readchar
  when :bits32; @io.read(4).unpack("V")[0]
  when :bits64_long; @io.read(8).unpack("q")[0]
  when :bits64_float; @io.read(8).unpack("E")
  when :enum; Enum.new(tag)
  when :bytes; len = read_vint; @io.read(len)
  when :invalid; raise BadWireType
  end
end

While the wire type can be obtained in constant time (just have to index an array), the case expression requires linear time to find the appropriate branch, since it will compare to the possible values (:tuple, :htuple, etc.) in order (moreover, this involves an extra method call, #===, per comparison). Essentially, what I'd like to do is to create an array of Procs and call the appropriate one. This is unfortunately very slow, so instead I make use of Ruby's native dispatching (i.e., method dispatching) this way:

class Vint < Base; def self.read(_, io); n = read_vint(io); (n >> 1) ^ -(n & 1) end end
class Bits8 < Base; def self.read(_, io); io.readchar end end
...

LL_TYPES = [
  Vint, Tuple, Bits8, Bytes, Bits32, HTuple, Bits64_long, Assoc,
  Bits64_float, Invalid, Enum, Invalid, Invalid, Invalid, Invalid, Invalid,
]

def self.read_value(io)
  prefix = read_prefix(io)
  LL_TYPES[prefix & 0xf].read(prefix, io)
end

Read more...

Read: Faster deserialization in Ruby, beating Marshal

Topic: Monkeybars on the march Previous Topic   Next Topic Topic: RubyGems code_swarm

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use