The Artima Developer Community
Sponsored Link

Artima Developer Spotlight Forum
Laurence Vanhelsuwé on Java Collections Pitfalls

7 replies on 1 page. Most recent reply: Jul 30, 2009 7:32 AM by Ian Robertson

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 7 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 10, 2009 9:38 PM
Reply to this message Reply
Advertisement

The Java Collections framework is one of the most frequently used libraries in Java applications. Treating the JDK Collections classes as just "a black box," however, is a mistake, according to Laurence Vanhelsuwé, founder of SoftwarePearls and lead developer of CollectionsSpy, a new Java profiler focused on detecting collections-related programming errors.

In this interview with Artima, Vanhelsuwé talks about the kinds of problems developers often encounter with Java collections, and how CollectionSpy helps overcome those problems:

Laurence Vanhelsuwé: The idea for CollectionSpy came from the recurring observation that a lot of enterprise Java developers struggle with Java collection containers. That may not be a problem for top programmers, but your average Java programmer does have issues with Hashtables, LinkedLists, ConcurrentHashMaps, and other containers. I have observed these problems in real-life software teams, and that led me to create CollectionSpy, a new kind of profiler, to help address those problems.

A lot of people are not aware, for example, how careful you need to be when overriding equals() or hashCode() in a class. Do that the wrong way—and statistics show that the majority of developers do—and containers can misbehave. If you use an object with a hashCode() implementation that uses mutable object state, and you use such objects as Map keys, you end up with mutable keys—I've seen a lot of mutable classes being used as keys. Our tool will instantly detect if a container is corrupted because one of its keys or elements gets mutated.

In general, developers may think that a JDK container acts like a perfect, benign black box. That would lead one to think that container performance can't degrade. But, in fact, it can. For instance, when writing hashCode() functions, you need to be careful that your values are distributed properly. If that's not done, you can have a properly functioning Hashtable or HashSet, but the performance will be atrocious: You're going to leak precious performance, while the problem may be well below the radar as far as top bottlenecks is concerned.

You can, in fact, have all sorts of unexpected behaviors in collections. The worst case I've seen was simply an infinite loop in HashMap.get() that resulted from corrupting the internal structures of the Map, that was, in turn, caused by multi-threaded access to the Map.

Another classic collections-related problem is that you're not finding in a collection the object you'd expect. That sometimes happens because the content of the collection changed in some unexpected ways. The question, then, is, How did the contents of that collection got to be that way? CollectionSpy addresses all those debugging nightmares.

Unlike conventional profilers, we restrict the kinds of objects we look at to collection framework containers. All the [JDK] containers are first-class citizens in CollectionSpy, and we track all sorts of information for each container. For example, we track the threads that access or use the container. If you have a non thread-safe container, such as HashMap, being accessed by several threads, that's an indication that we need to take a closer look at how and when that container is accessed. CollectionSpy would flag that container in one of its analysis rules. We also track all code that accesses any container: You can view the stack trace for any access to any container, so you can always find out the root cause of unexpected or problematic accesses.

CollectionSpy is a standalone tool: you just point it at your program, and it starts to produce the data right away. You don't have to do any manual instrumentation or annotation: Just drag-and-drop your existing program on the tool, launch the program, and CollectionSpy will start profiling. For server-based applications, it's a bit more involved, but that's also just a few steps, and we document that. CollectionSpy is the kind of power tool I've wished I had in my toolbox for years, so I'm sure it's going to save others many days of needless debugging frustration.

What kinds of Java collections-related problems do you frequently encounter in your code? Do you think profilers are an effective solution for detecting those problems?


Carson Gross

Posts: 153
Nickname: cgross
Registered: Oct, 2006

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 13, 2009 8:52 PM
Reply to this message Reply
I dunno if a profiler would help much, but as an aside, I must say that implementing equals and hashcode correctly is very tedious and error prone for any non-trivial class. IntelliJ makes it easier on me by doing some bat-shit insane code gen for me, but it sure isn't pretty.

Anyone have any better solutions from other languages?

Cheers,
Carson

Laurence Vanhelsuwe

Posts: 2
Nickname: laurencev
Registered: Jun, 2009

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 15, 2009 12:43 AM
Reply to this message Reply
CollectionSpy is really a hybrid profiler/code analyser. To detect data-related issues in a container, a tool really needs to be looking at the running system, not just the static source (or byte) code, as static code analysis tools do. We're very proud of CollectionSpy's unique capability to detect corruption of hashing containers... this is something only a profiler-type tool can do.

Carson Gross

Posts: 153
Nickname: cgross
Registered: Oct, 2006

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 16, 2009 10:19 AM
Reply to this message Reply
Another option is for the language itself to make the equals()/hashCode() easier to implement. This seems like a problem that could be addressed that that level.

Cheers,
Carson

Raoul Duke

Posts: 127
Nickname: raoulduke
Registered: Apr, 2006

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 16, 2009 4:26 PM
Reply to this message Reply
> Another option is for the language itself to make the
> equals()/hashCode() easier to implement. This seems like
> a problem that could be addressed that that level.

i'm not sure it is all that possible. i mean, the stuff eclipse generates isn't even totally right i think e.g. vs the whole "blindly equals" approach. i could well be wrong of course.

i think one big underlying problem is that equals is a relative term, or context-sensitive if you will. and in an environment with complicated typing relationships (even just "simple" inheritance), things empirically quickly get hard for programmers.

w00t.

Darko Latkovic

Posts: 9
Nickname: darko
Registered: Jul, 2009

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 21, 2009 6:07 PM
Reply to this message Reply
> i think one big underlying problem is that equals is a
> relative term
, or context-sensitive if you will. and
> in an environment with complicated typing relationships
> (even just "simple" inheritance), things empirically
> quickly get hard for programmers.
>
> w00t.

The equals() issue has been discussed a lot in the following thread:

http://www.artima.com/forums/flat.jsp?forum=226&thread=259279&start=0&msRange=15

Anthony Gerrard

Posts: 1
Nickname: agerrard
Registered: Sep, 2007

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 23, 2009 12:11 AM
Reply to this message Reply
Just wondering if the profiler includes a couple of useful things that Compuwares devpartner studio highlighted in a recent project of mine:

* numbers of empty collections e.g. where new HashMap() had been used in place of Collections.emptyMap()
* collections with low fill ratios e.g. where new ArrayList() had been used for lists that generally contained no more than two or three items and where we knew the size of the list would be n so should have used new ArrayList(n)

Admittedly not a priority on a lot of projects but on ours it ended up saving hundreds of megabytes of memory

Ian Robertson

Posts: 68
Nickname: ianr
Registered: Apr, 2007

Re: Laurence Vanhelsuwé on Java Collections Pitfalls Posted: Jul 30, 2009 7:32 AM
Reply to this message Reply
> I dunno if a profiler would help much, but as an aside, I
> must say that implementing equals and hashcode correctly
> is very tedious and error prone for any non-trivial class.
> IntelliJ makes it easier on me by doing some bat-shit
> t insane code gen for me, but it sure isn't pretty.
>
Pojomatic (http://pojomatic.sourceforge.net/) is a good solution for this problem.

Flat View: This topic has 7 replies on 1 page
Topic: The Adventures of a Pythonista in Schemeland: Identifier Equality Previous Topic   Next Topic Topic: Michael Feathers on Patronizing Language Design

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use