The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Making mutable state faster: optimizing the caml_modify writer barrier

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Making mutable state faster: optimizing the caml_modify writer barrier Posted: May 29, 2009 10:26 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Making mutable state faster: optimizing the caml_modify writer barrier
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

As I explained yesterday, the caml_modify write barrier needed by OCaml's incremental + generational GC is the main reason why immutable data can often be faster than mutable state. I've played a little with it and achieved a >50% speed boost in synthetic benchmarks, which could map to up to two-digit percentage gains in actual programs (in particular, it can make Array.map, init, make and the likes over 20% faster). The mutable state part from the synthetic business entity update benchmark becomes 32% faster.

How the write barrier works

caml_modify needs to record the references from values in the major heap to the minor heap. The code is straightforward:

#define Modify(fp, val) do{                                                 \
  value _old_ = *(fp);                                                      \
  *(fp) = (val);                                                            \
  if (Is_in_heap (fp)){                                                     \
    if (caml_gc_phase == Phase_mark) caml_darken (_old_, NULL);             \
    if (Is_block (val) && Is_young (val)                                    \
        && ! (Is_block (_old_) && Is_young (_old_))){                       \
      if (caml_ref_table.ptr >= caml_ref_table.limit){                      \
        CAMLassert (caml_ref_table.ptr == caml_ref_table.limit);            \
        caml_realloc_ref_table (&caml_ref_table);                           \
      }                                                                     \
      *caml_ref_table.ptr++ = (fp);                                         \
    }                                                                       \
  }                                                                         \
}while(0)

fp is the block slot where the (suspected) pointer is to be stored. caml_modify first checks if the destination is in the major heap with the Is_in_heap macro I'll come back to in a minute. Is_block(v) is true when v is not an immediate value. There's an interesting piece of logic that tries to avoid recording the same reference twice: if the previous reference _old_ held in fp was to a block in the minor heap, we know it's been already added to the ref table and there's no need to do it again. In the worst case, when updating N times the same slot with values in the minor and the major heap alternatingly, the reference will be recorded N/2 times --- it seems very unlikely this would happen in practice, though, as the ref table is cleared on each minor GC run.

The main problem with the above routine is that Is_in_heap is fairly slow. On 64-bit architectures it involves a lookup in a sparse hash table to see if the corresponding page is being managed by OCaml (on 32 bit, a two-level array is used). The reason why this check cannot just be implemented as a negated "is it in the minor heap" test (two pointer comparisons at most) is that it is possible to have blocks which belong neither to the minor heap nor to the major one --- iow., memory areas not managed by OCaml at all, and which are not to be traversed by the GC.

When we're not in the Phase_mark GC phase, the order of the checks can be altered so the expensive Is_in_heap test is only performed when the the value to be stored resides in the minor heap and the old one didn't before. This way, when the same slot is being updated several times, Is_in_heap will only be used once (with the exception explained above).

The resulting code, albeit quite ugly, is considerably faster. I measure the gain with two trivial programs:

Read more...

Read: Making mutable state faster: optimizing the caml_modify writer barrier

Topic: Safely dividing a UTF-8 String in Ruby Previous Topic   Next Topic Topic: When immutable data is faster: write barrier costs

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use