The Artima Developer Community
Sponsored Link

Articles Forum
Azul's Pauseless Garbage Collector

14 replies on 1 page. Most recent reply: Dec 23, 2010 1:29 PM by Gil Tene

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 14 replies on 1 page
Bill Venners

Posts: 2248
Nickname: bv
Registered: Jan, 2002

Azul's Pauseless Garbage Collector Posted: Dec 17, 2010 1:00 AM
Reply to this message Reply
Advertisement
Find out how Azul minimized garbage collection pauses in this interview with Azul CTO Gil Tene:

http://www.artima.com/lejava/articles/azul_pauseless_gc.html

What's your opinion of their algorithm? Do you have problems with garbage collection pauses, and if so, how have you dealt with those problems?


bug not

Posts: 16
Nickname: bugmenot2
Registered: May, 2005

Re: Azul's Pauseless Garbage Collector Posted: Dec 19, 2010 5:55 PM
Reply to this message Reply
How does this compare with other "fast" garbage collector's such as the Jamaica RTSJ JVM? (not quite a straight comparison as that's RTSJ rather than straight Java but still...)

Ted Lemon

Posts: 4
Nickname: abhayakara
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 20, 2010 5:53 PM
Reply to this message Reply
You talk about fixing up references using a read barrier to trap the reference. I'm having trouble connecting the dots--presumably what happens is that you get a trap, and your machine state contains the address being referenced. But it doesn't contain the address of the object that contains the reference, does it? So how do you reconstruct that so that you can apply the fixup to the referencing object? Are you taking advantage of your knowledge of the internals of the JVM to do this, or am I missing something?

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Azul's Pauseless Garbage Collector Posted: Dec 21, 2010 7:53 AM
Reply to this message Reply
If I remember correctly, under 80x386, when a protection fault happens, the address that caused the invalid access is stored into a register.

This is a nice trick, also done by the Boehm GC when incremental collection is enabled.

Ted Lemon

Posts: 4
Nickname: abhayakara
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 21, 2010 9:04 AM
Reply to this message Reply
Achilleas, I'm probably just missing something obvious here, but the address that causes the fault is not the problem. It's the object from which that address was loaded that is the problem.

We already know that the object at address X has moved, and we faulted on address X, so we are able to determine a value Y (the object's new location) to substitute for X. But in order to correct the reference that caused the fault, we need to store Y in the object that now contains a reference to X. My question is how we get the address of that object, not how we get X.

Johannes Rudolph

Posts: 2
Nickname: gambistics
Registered: Dec, 2007

Re: Azul's Pauseless Garbage Collector Posted: Dec 21, 2010 10:51 AM
Reply to this message Reply
> But in order to correct the reference that caused the
> he fault, we need to store Y in the object that now
> contains a reference to X. My question is how we get the
> address of that object, not how we get X.

Just guessing: If you know the state of all the registers and the instruction which triggered a trap, you should be able to figure out the location of the reference in question in most cases.

Does anyone know what the particular changes to the Linux kernel are? How much of that technology is open source within the managed runtime initiative? (http://www.managedruntime.org/)

Johannes Rudolph

Posts: 2
Nickname: gambistics
Registered: Dec, 2007

Re: Azul's Pauseless Garbage Collector Posted: Dec 21, 2010 10:56 AM
Reply to this message Reply
And, if anyone wondered what the part of the engine is that the system from Azul is improving in the Linux kernel, here's a more detailed description:
http://www.managedruntime.org/sites/www.managedruntime.org/files/downloads/AzulVmemMetricsMRI.pdf

In short, they introduced a way to collect and batch commit changes to the virtual memory mappings, so that the VM needs less safe-points at which the commits may occur.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 8:19 AM
Reply to this message Reply
> Achilleas, I'm probably just missing something obvious
> here, but the address that causes the fault is not the
> problem. It's the object from which that address was
> loaded that is the problem.
>
> We already know that the object at address X has moved,
> and we faulted on address X, so we are able to determine a
> value Y (the object's new location) to substitute for X.
> But in order to correct the reference that caused the
> he fault, we need to store Y in the object that now
> contains a reference to X. My question is how we get the
> address of that object, not how we get X.

So your question is how do we get the new address of X? it is probably stored in the old X. That's how compacting collectors work anyway: pointers are updated with the new address of an object stored in the current object:


object = object->new_address;


The pages that have objects that were moved need to stay in the 'moved' state until all references to the objects in that page have been successfully modified to point to the new objects. This causes greater memory consumption.

The above are my assumptions anyway. Another way to solve this issue would be to use a unique key for each object, and then search a map for a reference to the object, based on the key. But that would be highly inefficient.

Ted Lemon

Posts: 4
Nickname: abhayakara
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 10:07 AM
Reply to this message Reply
Nono. Consider the case of a simple stop-and-copy collector. At any given time, all of the objects are in a single arena. When you hit the high water mark, you stop consing and allocate a new, bigger arena. You then iterate across all the root pointers in your system, copying every object reached by the root pointers into the new arena. Whenever you copy an object, you leave a fixup object in its place; this object has the address of the moved object. When you are scanning the roots, whenever you come to a pointer to a fixup object, you fix it and stop scanning, knowing that every object reached through the fixed-up object has already been reached. Once you have explored all the roots, all reachable objects will have been copied into the new arena, and you can just discard the old arena, knowing that any objects not reached are garbage.

So for example suppose you have an object A, which references object B. And you have another object, C, which references B as well. And suppose both A and C are reached by different roots. When you go to collect, you will reach A, and A will point to B in the old arena. So you'll move A into the new arena, and then move B into the new arena, and then adjust A's pointer to B. And you will update the version of B in the old arena to be a fixup object. Later on in the collection pass when you reach C, C's pointer to B will be pointing to the fixup object in the old arena. So you will copy C into the new arena, and update C's pointer to B using the information in the fixup object.

Now in this read-barrier model, suppose you move A and B, and fix up A's reference to B. And now suppose that your program accesses C after you've moved A and B. C's pointer to B will be pointing into the old arena, which has a read barrier, so you'll get a trap at that point, and you'll have the old address of B in the trap exception data. But what you won't have in the exception data is the address of C. And you need the address of C in order to update C's pointer to B.

So my original question was not how you got the address of B, but how you got the address of C. The only way I can think of to do it is to use your knowledge of the JVM--perhaps you know that when B is referenced, the object that referenced B must be in some register. If that's how they're doing it, then this isn't a general mechanism--it will only work with a specific JVM.

Hm, so are you saying that you take the fault when you reference C, which is in the old arena, not when you reference B? In that case, the fault address would be the address of C, and you'd be able to move C and update C's pointers at that time. If that's the case, then I've been missing what's novel about this approach, because using a read barrier in this way has been done before.

Gil Tene

Posts: 4
Nickname: giltene
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 3:38 PM
Reply to this message Reply
>...
> So my original question was not how you got the address of
> B, but how you got the address of C. The only way I can
> think of to do it is to use your knowledge of the
> JVM--perhaps you know that when B is referenced, the
> object that referenced B must be in some register. If
> that's how they're doing it, then this isn't a general
> mechanism--it will only work with a specific JVM.

Ted, sorry for the late response. Hopefully this will shed some light:

You are absolutely correct that at a trap point, you mainly know the trapping address (the address of the object being relocated - object B in you example). However, since the JVM "knows all" - including the state of the java machine at the read barrier test point, and including the fact that the address being checked (call it "val") was just loaded from some source address (call it "addr" - the address of the field in object C that references object B in your example), it can go back to that state and use to it to determine and perform the self healing operation - fixing the ref at "addr" to point to B's correct location (or just fixing it's mark phase indication if that's the thing we didn't like about it).

How that's done exactly varies - the JVM may run into such read barriers in C2 (-server) compiled code, or in C1 (-client, or -tiered on the Azul VM) compile code, or in interpreted code, or in the runtime itself (there are a whole bunch of places in the C++ HotSpot code that reference objects, and they all have to have good read barriers too). The level of optimization done for each differs (in the C2 case it is highly optimized to fit well in the pipeline), but in all cases, the source memory address of the value being barrier'ed is identified and healed.

Ted Lemon

Posts: 4
Nickname: abhayakara
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 3:43 PM
Reply to this message Reply
Ah, thanks, that's exactly the information I was looking for. :)

Gil Tene

Posts: 4
Nickname: giltene
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 3:51 PM
Reply to this message Reply
Achilleas, to address/clarify a couple of your points:

>...
> The pages that have objects that were moved need to stay
> in the 'moved' state until all references to the objects
> in that page have been successfully modified to point to
> the new objects. This causes greater memory consumption.
>

The "secret" behind quick-release is that the above statement is true for the *virtual* addresses of the pages (that have objects that were moved) - as the addresses themselves cannot be safely recycled until all references to them have been hunted down and fixed (which will happen by the end of our next Mark-Remap phase). However, the actual physical memory resources backing those virtual page addresses is not needed once the object contents have been moved, and can be recycled immediately at new virtual addresses without waiting for the remaps to complete.

We can therefor compact the entire heap using a single empty page, by doing hand over hand compaction - feeding each page compaction with physical pages release from the previous page's compaction. In reality we use a bit more than that (one empty page per compacting GC thread), but we can always compact the heap, and never need to worry about not having enough empty memory to perform the compaction...

> The above are my assumptions anyway. Another way to solve
> this issue would be to use a unique key for each object,
> and then search a map for a reference to the object, based
> on the key. But that would be highly inefficient.

Your assumption is correct for most relocating collectors, but it is also what keeps them from being able to recycle the memory quickly. In the Azul VM, we keep this forwarding information metadata outside of the pages being compacted from, and store them in pretty compact and efficient ways Keeping it outside of the physical pages and original object space allows quick release to be performed (otherwise, as you noted, we would not be able to release the page until all references to them were identified and remapped).

Gil Tene

Posts: 4
Nickname: giltene
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 22, 2010 4:26 PM
Reply to this message Reply
> How does this compare with other "fast" garbage
> collector's such as the Jamaica RTSJ JVM? (not quite a
> straight comparison as that's RTSJ rather than straight
> Java but still...)

They live in different application spaces. Jamaica, like several other RTSJ implementations, focuses on hard real time realtime systems (typically in the embedded world). See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.5942&rep=rep1&type=pdf. I would not characterize it as a "fast" GC - instead I would characterize is as an extremely consistent GC.
Jamaica identifies the universally recognized problem of heap fragmentation (there is no way to stop fragmentation from happening in contiguous Java heaps). In order to combat fragmentation without using compaction, such systems often use new heap and object layouts. In Jamaica's case, the heap is broken into fixed sized blocks [32 bytes in Jamaica's case] and Java objects larger than 32 byte end up chaining discontiguous blocks together (in either linked lists for objects, or trees for arrays). This achieves the hard real time requirement by guaranteeing no compaction will ever be needed, but it comes at a very steep memory access costs (in both memory op counts and cache misses, and in the loss of optimization opportunities and the ability to directly use common CPU addressing modes). It is exactly the right tradeoff for hard real time. Real time is not about being fast, it's about being slow but predictable.

The Azul VM serves a completely different space - we focus on enabling modern, commodity hardware capacities to be directly leveraged in enterprise Java apps without fear of the dreaded GC pause (which grows linearly with memory on all other Java SE VMs). In our world, metrics like sustainable applications throughput (in TPS or whatever), numbers concurrent sessions and users, and consistent response times (in SLA terms, not in hard real time terms) are what matters. Since a modern x86 server with 24 vCores and 96GB can be had for less than $7K these days, we find that applying that cheap capacity towards the needed metrics is pretty easy once you've taken GC pauses off the table.

John Wellbelove

Posts: 72
Nickname: garibaldi
Registered: Mar, 2008

Re: Azul's Pauseless Garbage Collector Posted: Dec 23, 2010 6:25 AM
Reply to this message Reply
> Real time is not about being fast, it's about being slow but predictable.

I'd remove the 'slow' part. It's no good being predictable in a real-time system if the prediction is that the system is so slow that you will always miss the timing deadline.

Gil Tene

Posts: 4
Nickname: giltene
Registered: Dec, 2010

Re: Azul's Pauseless Garbage Collector Posted: Dec 23, 2010 1:29 PM
Reply to this message Reply
> > Real time is not about being fast, it's about being slow
> but predictable.
>
> I'd remove the 'slow' part. It's no good being predictable
> in a real-time system if the prediction is that the system
> is so slow that you will always miss the timing deadline.

:-). "Just fast enough" perhaps?

Real time systems get no extra credit for running faster than needed, but any missing of deadlines is considered a failure. This almost always translates into paying with significant (but consistent) overhead that reduces throughput and speed, but guarantees minimal speeds and never missing deadlines.

In this specific case (Jamaica and it's real time GC), breaking all objects into non-contiguous 32 byte chunks chain together with pointers will cause all access to arrays and object fields to commonly run 3-8x slower than on a Java SE JVM. A perfectly good trade off for a real-time system.

Flat View: This topic has 14 replies on 1 page
Topic: What's New in Scala 2.8: The Architecture of Scala Collections Previous Topic   Next Topic Topic: Next-Generation Testing with TestNG


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us