Ruby Buzz Forum - Making mutable state faster: optimizing the caml

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Making mutable state faster: optimizing the caml_modify writer barrier

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

Making mutable state faster: optimizing the caml_modify writer barrier

Posted: May 29, 2009 10:26 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Making mutable state faster: optimizing the caml_modify writer barrier Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

As I explained yesterday, the caml_modify write barrier needed by OCaml's incremental + generational GC is the main reason why immutable data can often be faster than mutable state. I've played a little with it and achieved a >50% speed boost in synthetic benchmarks, which could map to up to two-digit percentage gains in actual programs (in particular, it can make Array.map, init, make and the likes over 20% faster). The mutable state part from the synthetic business entity update benchmark becomes 32% faster.

How the write barrier works

caml_modify needs to record the references from values in the major heap to the minor heap. The code is straightforward:

#define Modify(fp, val) do{                                                 \
  value _old_ = *(fp);                                                      \
  *(fp) = (val);                                                            \
  if (Is_in_heap (fp)){                                                     \
    if (caml_gc_phase == Phase_mark) caml_darken (_old_, NULL);             \
    if (Is_block (val) && Is_young (val)                                    \
        && ! (Is_block (_old_) && Is_young (_old_))){                       \
      if (caml_ref_table.ptr >= caml_ref_table.limit){                      \
        CAMLassert (caml_ref_table.ptr == caml_ref_table.limit);            \
        caml_realloc_ref_table (&caml_ref_table);                           \
      }                                                                     \
      *caml_ref_table.ptr++ = (fp);                                         \
    }                                                                       \
  }                                                                         \
}while(0)

fp is the block slot where the (suspected) pointer is to be stored. caml_modify first checks if the destination is in the major heap with the Is_in_heap macro I'll come back to in a minute. Is_block(v) is true when v is not an immediate value. There's an interesting piece of logic that tries to avoid recording the same reference twice: if the previous reference _old_ held in fp was to a block in the minor heap, we know it's been already added to the ref table and there's no need to do it again. In the worst case, when updating N times the same slot with values in the minor and the major heap alternatingly, the reference will be recorded N/2 times --- it seems very unlikely this would happen in practice, though, as the ref table is cleared on each minor GC run.

The main problem with the above routine is that Is_in_heap is fairly slow. On 64-bit architectures it involves a lookup in a sparse hash table to see if the corresponding page is being managed by OCaml (on 32 bit, a two-level array is used). The reason why this check cannot just be implemented as a negated "is it in the minor heap" test (two pointer comparisons at most) is that it is possible to have blocks which belong neither to the minor heap nor to the major one --- iow., memory areas not managed by OCaml at all, and which are not to be traversed by the GC.

When we're not in the Phase_mark GC phase, the order of the checks can be altered so the expensive Is_in_heap test is only performed when the the value to be stored resides in the minor heap and the old one didn't before. This way, when the same slot is being updated several times, Is_in_heap will only be used once (with the exception explained above).

The resulting code, albeit quite ugly, is considerably faster. I measure the gain with two trivial programs:


	Web Artima.com