The Artima Developer Community
Sponsored Link

Weblogs Forum
Myths of Memory Management

81 replies on 6 pages. Most recent reply: Sep 12, 2005 3:10 PM by Max Lybbert

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 81 replies on 6 pages [ 1 2 3 4 5 6 | » ]
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Myths of Memory Management (View in Weblogs)
Posted: Sep 4, 2005 11:22 AM
Reply to this message Reply
Summary
There is a widespread belief that garbage collection leads to automatic memory management, and that without a GC memory management is neccessarily complex. These are both false.
Advertisement
In every piece of non-trivial software, irregardless of the language, resource management is a concern. When a resource is no longer needed you need to be assured that that resource is returned to the system. This is as true for GC languages as it is in non-GC langauges. Some programmers seem to think that because the language has a GC they simply can turn a blind eye to resource management. I think this accounts for why so many programs written in Java I come accross, consume staggering amounts of resources, and have miserable performance, despite all the research which says how efficient Java is compared to C++.

In a GC language, the garbage collector will only reclaim resources which are no longer live. Assuring a resource is reclaimed is no easy task in a non-trivial piece of software. The best way to assure a resource is released is for the programmer to set pointers to null when they no longer refer to a resource (or let them go out of scope), and to minimize the sharing of objects (which is just good practice).

So in a GC language, you have to pay attention to resource management, just as you would in a non-GC language, but one major difference exists. If you make a resource management mistake in your design using a GC language such as Java, everything works fine, it is simply a bit less efficient than it should be. In C++ something magical happens: undefined behaviour. Of course undefined behaviour is the greatest of the programming evils, and it is primarily responsible for the mass exodus of programmers from C++.

Given that a programmer in Java or C# should be setting their pointers to NULL, what is the difference between that and explcitly calling a delete? The answer is more or less nothing. The real question is what happens when the programmer makes the inevitable mistake. Undefined behaviour is unacceptable for most of us sane people, but silently keeping things alive is a very hard to detect resource leak. I think a programmer would have the best of both worlds if their language avoided undefined behaviour, threw errors when things aren't deleted when they should, and prevented things from being deleted which shouldn't be.

As for the complexity of resource management, well the design issues of who owns what resource is inescapable whether or not your langauge has a GC or not. You can always choose to ignore it, but it will have repercussions. The only difference for your language is whether this results in poorly performing software, or undefined behaviour.

Postscript: Moore's Law

Something which drives me up the wall, is programmers who quote Moore's law and say resource management is unimportant. What they conveniently ignore is that the demands placed on computers and the functionality inherent in software has followed the same curve. My point: you can't use Moore's Law as an excuse to write bad software.


Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Myths of Memory Management Posted: Sep 4, 2005 11:40 AM
Reply to this message Reply
> So in a GC language, you have to pay attention to resource
> management, just as you would in a non-GC language, but
> one major difference exists. If you make a resource
> management mistake in your design using a GC language such
> as Java, everything works fine, it is simply a bit less
> efficient than it should be.

You should really count the myth that GC is slower. Compared to the free list management that many manually collected systems use, it can be faster although less deterministic.

> Given that a programmer in Java or C# should be setting
> their pointers to NULL, what is the difference between
> that and explcitly calling a delete? The answer is more or
> less nothing. The real question is what happens when the
> programmer makes the inevitable mistake. Undefined
> behaviour is unacceptable for most of us sane people, but
> silently keeping things alive is a very hard to detect
> resource leak. I think a programmer would have the best of
> both worlds if their language avoided undefined behaviour,
> threw errors when things aren't deleted when they should,
> and prevented things from being deleted which shouldn't
> be.

The big difference is safety. Manual deletion invalidates all references.

We can avoid manual deletion with reference counting (which is a form of GC) but reference counting is expensive. I remember some studies showing that systems that use it with finely grained allocation spend a gargantuan amount of time incrementing and decrementing counts.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 12:07 PM
Reply to this message Reply
> You should really count the myth that GC is slower.
> Compared to the free list management that many manually
> y collected systems use, it can be faster although less
> deterministic.

Sure GC is not neccessarily slower, nor is it neccessarily faster. It depends on what GC mechanism you use, and what hard-coded technique you compare it to. Every case is unique. However theoretically any generalized automated GC can be optimized by hand for any given particular non-trivial software. In practice it is rarely practical to do so.

> The big difference is safety. Manual deletion invalidates
> all references.

That assumes something: that you would allow manual deletion of a resource which has live references. You can throw an exception when manual deletion is attempted on something still containing references. This forces programmers to explcitly remove references before preventing deletion. Arguably this can only be seen as a "Good Thing".

Vesa Karvonen

Posts: 116
Nickname: vkarvone
Registered: Jun, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 12:42 PM
Reply to this message Reply
> Some programmers seem to think that because the language has a GC they
> simply can turn a blind eye to resource management.

Poor education of programmers is not exclusive to GC'd languages. I've seen plenty of C and C++ programs that leak all kinds of resources. What you are implying above seems silly to me. The problem (sloppy programming) exists in all languages. It is just a couple of orders of magnitude worse in non-GC'd languages.

> I think this accounts for why so many programs written in Java I come
> accross, consume staggering amounts of resources, and have miserable
> performance

I think that your analysis is misguided here. Java is a particularly bad data point for making a judgement on GC. For one thing, Java objects have a ridiculously high overhead: a plain Object may take 16 bytes. Many other GC'd languages do not have such overheads and they perform upto an order of magnitude better. For example, MLton uses significantly less memory (sometimes 15 times less) than Java on most shootout programs:

http://shootout.alioth.debian.org/benchmark.php?test=all&lang=mlton&lang2=java&sort=fullcpu

While pointing to language comparisons, here is one comparing Java/Sun's, C++/g++, Ocaml, and SML/MLton&SML/NJ (and hopefully also Scheme/Stalin soon):

http://www.ffconsultancy.com/free/ray_tracer/languages.html
http://groups.google.fi/group/comp.lang.scheme/msg/cf3c40620b81443a

As you can see, Java performs much worse than the other languages in that comparison.

> I think a programmer would have the best of both worlds if their
> language avoided undefined behaviour, threw errors when things aren't
> deleted when they should, and prevented things from being deleted which
> shouldn't be.

I trust that you are aware of the fact that such deterministic behavior will have a very high cost. Unless I'm mistaken, it basically means that, in the worst case, every write to a pointer may potentially require a check, which can easily double (or more) the cost of pointer writes.

> The only difference for your language is whether this results in poorly
> performing software, or undefined behaviour.

If you want to make performance comparisons, I suggest that you compare to the state-of-the-art (Stalin, MLton, Ocaml) rather that Java.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 2:31 PM
Reply to this message Reply
> Poor education of programmers is not exclusive to GC'd
> languages. I've seen plenty of C and C++ programs that
> leak all kinds of resources.

In fact the majority of non-trivial C/C++ programs that I have seen do.

> What you are implying above
> seems silly to me.

I don't know what you think I am implying.

> The problem (sloppy programming) exists
> in all languages.

Yes of course.

> It is just a couple of orders of
> magnitude worse in non-GC'd languages.

I assume you mean that the end result is worse, not the problem itself. In fact the problem is more prevalent in GC'd languages, because the resource mismanagement is harder to detect.

> I think that your analysis is misguided here. Java is a
> particularly bad data point for making a judgement on GC.

The fact still remains that in Java (irregardless of the GC performance) programmers do a poor job of managing resources. It is the misguided reliance on the GC to do things it can't do which I am railing on, not the GC itself. Frankly I will not argue about GC performance any more. It could be instantaneous, it still doesn't change my fundamental criticism of it: GC's encourage bad design. Of course I believe that naive manual pointer manipulation is worse. What I am proposing is something entirely original ( AFAIK ).

> I trust that you are aware of the fact that such
> deterministic behavior will have a very high cost.

No more so than a GC. You can achieve the effect of error on illegal deletion by trivially modifying any GC algorithm.

> Unless I'm mistaken, it basically means that, in the
> worst case, every write to a pointer may potentially
> require a check, which can easily double (or more)
> the cost of pointer writes.

You don't need to check on write, you check on delete. The algorithm is very simple, if manually deleting an object, which is alive throw an error. To check whether something is alive, ask any GC algorithm "can you clean up X?".

Keith Ray

Posts: 658
Nickname: keithray
Registered: May, 2003

Re: Myths of Memory Management Posted: Sep 4, 2005 2:41 PM
Reply to this message Reply
"You can throw an exception when manual deletion is attempted on something still containing references"

Determining whether there are still live references - in the general case - requires either reference-counting (expensive) or the kind of algorithms that garbage-collectors use (such as "mark and sweep").

I see that Richard Jones has references to over 1800 papers on garbage collection, http://www.cs.kent.ac.uk/people/staff/rej/gc.html.

The worst Java code I ever saw was written by a bad C programmer. It was copying data all over the place, pieces of algorithms were spread all over the place, static methods were everywhere. It was even attempting to manually manage memory, not taking advantage of garbage collection.

Check out: http://homepage.mac.com/keithray/blog/2004/11/11/

and
http://www.whysmalltalk.com/tutorials/files/vwMemMgmnt-Handout-x2.pdf

Vesa Karvonen

Posts: 116
Nickname: vkarvone
Registered: Jun, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 2:46 PM
Reply to this message Reply
> There is a widespread belief that garbage collection leads to automatic
> memory management, and that without a GC memory management is
> neccessarily complex.

It is more than a belief. The problem with lack of GC is that certain kinds of programming styles and abstractions become either impossible or impractical. In other words, while it is true that GC doesn't solve all resource management issues, the lack of GC has a non-trivial effect on the kinds of programming styles and abstractions that can be comfortably expressed in the language.

Having to explicitly deal with temporaries, several kinds of pointers, etc... is going to have a detrimental effect on the ease of programming and the design of abstractions. Let's take a look at the snippet of code from your earlier blog post:


my_type^ x = new my_type();
my_type^ y;
y = x; // illegal instruction, ownership is either transfered or shared
y = transfer(x); // x is set to NULL in the process
x = share(y); // now both point to the object
x = NULL; // x releases hold on object
y = NULL; // runtime error, y has sole-ownership but forgot to delete it


Take a close look at the last line.

Have you considered the consequences of the behavior of NULL assignment on modularity? Under what circumstances is it safe to assign a NULL to a (strong) pointer or let it fall off scope? Is this a concern that cuts through abstraction barriers (module interfaces)?

It seems to me that in Heron, assigning a NULL to a pointer or letting a pointer fall off scope is a risky business. You are probably going to need something like this pretty much everywhere:


// code that originally got the resource bound to pointer "p" from some
// external source, and you just don't know whether "p" is the last
// reference to the resource or not.
try {
p = NULL;
} catch (MemoryLeakException e) {
delete x;
}


Sure, you may be able to define something like "smart pointers" in Heron to automate the above (meaning to automate the proper action to take when destroying a pointer (the use of exception handling isn't the issue above)), but the cost is going to be significant in terms of both performance (pointer manipulation will be slower) and increased programming burden and reduced abstraction ability (you always have to explicitly deal with smart references).

Brian Slesinsky

Posts: 43
Nickname: skybrian
Registered: Sep, 2003

Re: Myths of Memory Management Posted: Sep 4, 2005 3:08 PM
Reply to this message Reply
re: "The best way to assure a resource is released is for the programmer to set pointers to null when they no longer refer to a resource (or let them go out of scope) and to minimize the sharing of objects (which is just good practice)."

It sounds like you're frustrated that other folks aren't paying enough attention to performance, but countering myths with other myths won't help. What's wrong with sharing objects in Java? This isn't C++ and we don't need to carefully track the lifetime of every String. It's usually sufficient to be careful about the lifetime of big objects (like DOM trees).

I also disagree with your advice to set unused pointers to null. In my experience, most of the time there's no measurable performance improvement and it clutters up the code.

Memory leaks in Java are a performance issue and the solution is the same as any other performance issue: use a profiler to find out where the bottlenecks are. Don't do anything without measuring to see if it actually helps. Guessing about performance bottlenecks just makes the code more obscure for no reason.

> "silently keeping things alive is a very hard to detect resource leak"

This is true, especially when you don't have the right tools. IDE's really should have built-in memory profilers, just like debuggers.

Vesa Karvonen

Posts: 116
Nickname: vkarvone
Registered: Jun, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 3:12 PM
Reply to this message Reply
> > What you are implying above seems silly to me.
> I don't know what you think I am implying.

I think you were implying that GC encourages sloppy programming. Judging from your messages, it seems that I was correct (in my assumption about your implication).

> In fact the problem is more prevalent in GC'd languages, because the
> resource mismanagement is harder to detect.

In what way, specifically, is resource mismanagement harder to detect in GC'd languages? Give an example of a situation where resource mismanagement is easier to detect in a non-GC'd language.

> The fact still remains that in Java (irregardless of the GC performance)
> programmers do a poor job of managing resources.

Are you saying that Java programmers do a relatively worse job of resource management than, say, C or C++ programmers? How about citing some hard statistics on this matter?

> GC's encourage bad design

Point me to controlled studies that support your position (on programmer behavior) and I will be happy to reconsider the validity of your assertion. Otherwise you should admit that you are simply pulling your assertions out of thin air (anecdotal "evidence") with no data to support them.

> You don't need to check on write, you check on delete. The algorithm is
> very simple, if manually deleting an object, which is alive throw an
> error. To check whether something is alive, ask any GC algorithm "can
> you clean up X?".

Are you saying that the runtime of Heron will run a GC algorithm to detect resource (memory) management errors, but all deallocations must be made explicitly?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 3:27 PM
Reply to this message Reply
> I think you were implying that GC encourages sloppy
> programming. Judging from your messages, it seems that I
> was correct (in my assumption about your implication).

Yes.

> In what way, specifically, is resource mismanagement
> harder to detect in GC'd languages?

If a design calls for object O to be deleted on condition C, but some other module M continues to reference O, then O will continue to be kept alive. This is a design error, and damn hard to detect.

> Give an example of a
> situation where resource mismanagement is easier to detect
> in a non-GC'd language.

In Heron any explicit deletion of a refernced resource indicates a design error and a runtime error is thrown. In many C++ environments, access of an explicitly deleted object, leads to a page fault. So the design error gets caught.

> > The fact still remains that in Java (irregardless of the
> GC performance)
> > programmers do a poor job of managing resources.
>
> Are you saying that Java programmers do a relatively worse
> job of resource management than, say, C or C++
> programmers?

It is my experience.

> How about citing some hard statistics on this
> matter?

I can't.

> Point me to controlled studies that support your position
> (on programmer behavior) and I will be happy to reconsider
> the validity of your assertion.

There are no controlled studies to the contrary neither that I am aware of.

> Otherwise you should admit
> that you are simply pulling your assertions out of thin
> air (anecdotal "evidence") with no data to support them.

I will happily admit that.

> > You don't need to check on write, you check on delete.
> The algorithm is
> > very simple, if manually deleting an object, which is
> alive throw an
> > error. To check whether something is alive, ask any GC
> algorithm "can
> > you clean up X?".
>
> Are you saying that the runtime of Heron will run a GC
> algorithm to detect resource (memory) management errors,
> but all deallocations must be made explicitly?

The Heron specification will be to throw an error in that case, how it is implemented is to be determined. However I see nothing wrong with the implementation approach you describe, and in fact I currently can't yet imagine any other way to achieve the desired effect reliably. I am considering the possibility also that this only occurs during debug builds.

What do you think would turn out bad about all that?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 3:42 PM
Reply to this message Reply
> It is more than a belief. The problem with lack of GC is
> that certain kinds of programming styles and abstractions
> become either impossible or impractical.

You can give a specific example?

>

> // code that originally got the resource bound to
> to pointer "p" from some
> // external source, and you just don't know whether "p"
> "p" is the last
> // reference to the resource or not.
> try {
> p = NULL;
> } catch (MemoryLeakException e) {
> delete x;
> }
>


Good example. This is common code in Java-like languages, where people ignore the concern of ownership. A module should either be explicitly sharing a resource, or owning a resource. It makes the concern of resource management much more explicit and the code easier to predict and understand. Code like the above is what leads to a rat's nest of coupling of the concern of ownership. A module which creates an object, should also destroy it. Not doing so, makes it impossible to manage resources and accurately predict how your software will use resources, which is why I say GC'd code sucks at resource management.

> Sure, you may be able to define something like "smart
> pointers" in Heron to automate the above (meaning to
> automate the proper action to take when destroying a
> pointer (the use of exception handling isn't the issue
> above)), but the cost is going to be significant in terms
> of both performance (pointer manipulation will be slower)

Saying that that would be a significant performance hit, is an overstatement. Smart pointers should not be overused, and in the grand scheme of things are not that expensive.

> and increased programming burden and reduced abstraction
> ability (you always have to explicitly deal with
> smart references).

It's all on purpose. Programmers have to learn to explicitly deal with their resources, whether they want to or not.

Vesa Karvonen

Posts: 116
Nickname: vkarvone
Registered: Jun, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 3:52 PM
Reply to this message Reply
> Programmers have to learn to explicitly deal with their
> resources, whether they want to or not.

I can see now why you think that Heron will be an educational language. Sorry, I couldn't resist. :-)

[I'll get back to the other points later (my clock says it is time to get some sleep [02:28]).

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 4, 2005 4:40 PM
Reply to this message Reply
> > Programmers have to learn to explicitly deal with their
> > resources, whether they want to or not.
>
> I can see now why you think that Heron will be an
> educational language. Sorry, I couldn't resist. :-)

No problem, I guess it is apparent that I am a big fan of the bondage and discipline approach to programming language design.

> [I'll get back to the other points later (my clock says it
> is time to get some sleep [02:28]).

I look forward to continuing the discussion.

Vesa Karvonen

Posts: 116
Nickname: vkarvone
Registered: Jun, 2004

Re: Myths of Memory Management Posted: Sep 5, 2005 2:39 AM
Reply to this message Reply
[From another message]
> However theoretically any generalized automated GC can be optimized by
> hand for any given particular non-trivial software. In practice it is
> rarely practical to do so.

Far more common than hand optimization is the use of simple tuning of the parameters of the GC algorithm to improve performance. There are also adaptive GC algorithms (e.g. http://mlton.org/GarbageCollection) that should offer good performance even without tuning.

> If a design calls for object O to be deleted on condition C, but some
> other module M continues to reference O, then O will continue to be kept
> alive. This is a design error, and damn hard to detect.

[The error may also be that the condition C is wrong.]

Why do you think this is simpler to detect in a non-GC'd language?

In other words, why do you think that GC rules out explicit resource management? If the deletion of a particular resource is really that important, then you should design the program in such a way that it happens. This can be done in GC'd language.

> In Heron any explicit deletion of a refernced resource indicates a design
> error and a runtime error is thrown.

Why do you think such an error could not be detected just as easily in a GC'd language?

I'm not trying to be clever here. I just don't see any fundamental reason as to why such errors would necessarily be harder to detect in a GC'd language.

GC allows you to ignore memory management in the 99.9% of cases where it doesn't make a difference. Sure, it still takes some knowledge to implement imperative containers in GC'd languages in a space safe manner, but the problems are less severe than in non-GC'd languages.

Explicit memory deallocation is the problem. Due to its pervasiness and topological complexity (you rarely have, say, file handles forming a cycle), memory is arguably the most complex resource to manage. There are situations in which there is no clear single owner for a piece of memory. Without GC you have to invent such an owner and/or complicate the design of all users of the block of memory. Manual deallocation is the source of problems - not the solution. Each time a design calls for explicit manual deallocation, you should expect problems.

> In many C++ environments, access of an explicitly deleted object, leads
> to a page fault. So the design error gets caught.

That's wishful thinking and it doesn't correspond with neither my experience as a C++ programmer nor with my knowledge of the implementation issues. Due to performance reasons, most C++ implementations allocate memory from the OS in large chunks and manage the memory themselves. Deallocation in such an implementation does not (usually) invalidate the deleted block for access. One technical reason for this is that the MMUs of most CPUs do not allow protection at the level of arbitrary blocks of memory, but rather at some relatively large granularity (e.g. 4KB).

> > Heron will run a GC algorithm to detect resource (memory) management
> > errors, but all deallocations must be made explicitly?

> [...] I currently can't yet imagine any other way to achieve the desired
> effect [...]

It does surprise me that you originally started by complaining about the performance of GC.

> I am considering the possibility also that this only occurs during debug
> builds.

Many problems (particularly, but not limited to, performance problems) only manifest themselves in production use.

> What do you think would turn out bad about all that?

Abysmal performance (cost of GC + explicit management) and poor production reliability (safety off when most accidents happen). And, of course, the inconvenience of having to do all memory management explicitly (rather than just the few special cases).

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Myths of Memory Management Posted: Sep 5, 2005 4:12 AM
Reply to this message Reply
> > What do you think would turn out bad about all that?

> Abysmal performance (cost of GC + explicit management)

Explicit management in the explicit management scheme I outlined is not more expensive than a GC, it should be at most precisely the same. When the programmer explicitly calls delete, the system doesn't have to actually deallocate the block right away it could simply pass it to a GC.

> and
> poor production reliability (safety off when most
> accidents happen).

Point taken.

> And, of course, the inconvenience of
> having to do all memory management explicitly (rather than
> just the few special cases).

That is where we fundamentally disagree. I believe that forcing explicit design choices about ownership of resources is not inconvenient but a neccessity.

I'm sorry I don't have time to respond to everything, but I do read it and digest it.

Flat View: This topic has 81 replies on 6 pages [ 1  2  3  4  5  6 | » ]
Topic: How Much Profit is Enough? Previous Topic   Next Topic Topic: QA and TDD

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use