The Artima Developer Community
Sponsored Link

Artima Developer Spotlight Forum
Optimizing ArrayList.removeAll()

41 replies on 3 pages. Most recent reply: Mar 9, 2007 9:39 PM by damien morton

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 41 replies on 3 pages [ « | 1 2 3 | » ]
amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 1, 2007 8:48 PM
Reply to this message Reply
Advertisement
Tobias, the O(n) algorithms have been corrected. Here are the modified versions that null the slack space after removal:


public boolean removeAll(Collection<?> c) {
int x=0, i=0;
for (i=0; i<size; i++)
if (!c.contains(elementData[i]))
elementData[x++] = elementData[i];
if (x != i) {
Arrays.fill(elementData, x, i, null);
size = x;
return true;
} else {
return false;
}
}


and


public boolean removeAll(Collection<?> c) {
int oldHi=0, newHi=0, top=0;
for (int i=0; i<size; ++i) {
if (c.contains(elementData[i])) {
oldHi = newHi;
newHi = i;

// at the end of this loop newHi will be the non-inclusive
// upper limit of the range to delete.
//
while (++newHi < size && c.contains(elementData[newHi]));

final int length = i - oldHi;
System.arraycopy(elementData, oldHi, elementData, top, length);
i = newHi;
top += length;
}
}
if (newHi > 0) {
final int k = size - newHi;
System.arraycopy(elementData, newHi, elementData, top, k);
final int n = top + k;
Arrays.fill(elementData, n, size, null);
size = n;
return true;
} else {
return false;
}
}

Mark Thornton

Posts: 275
Nickname: mthornton
Registered: Oct, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 2, 2007 1:18 AM
Reply to this message Reply
> Technically all of these issues result from breaking the
> contract of contains():
>
> "More formally, returns true if and only if this set
> contains an element e such that (o==null ? e==null :
> o.equals(e))."

There are plenty of legitimate reasons for have differing notions of equality. It would be better if there was a method on Collection to obtain the 'concept' of equality in use by that Collection. This would return either a Comparator or an interface containing methods to test equality and provide a hashCode.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 2, 2007 6:19 AM
Reply to this message Reply
> > Technically all of these issues result from breaking
> the
> > contract of contains():
> >
> > "More formally, returns true if and only if this set
> > contains an element e such that (o==null ? e==null :
> > o.equals(e))."
>
> There are plenty of legitimate reasons for have differing
> notions of equality.

Definitely. Didn't mean to suggest otherwise. It's more that the Collections interface is itself flawed. Hardly any of the implementations follow the rules so we can't make many assumptions. The contract isn't good enough because we need these other implementations.

> It would be better if there was a
> method on Collection to obtain the 'concept' of equality
> in use by that Collection. This would return either a
> Comparator or an interface containing methods to test
> equality and provide a hashCode.

There's actually an RFE for Equalator (you can search for that word and find it in the bug db. I've voted for it an argued for it many times. Depending only on the Objects themselves to determine equality in the Collection is fundamentally flawed. An arbitrator is needed especially when inheritance comes into play.

amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 2, 2007 6:22 AM
Reply to this message Reply
Sun accepted an RFE to fix this issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6529800.

Surender Kumar

Posts: 10
Nickname: darthvader
Registered: Jun, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 3, 2007 9:35 AM
Reply to this message Reply
Amin,
Good that Sun has accepted the issue which is O(ns) performance. But that doesn't take away James' argument that your implementations too remain O(ns). Please see the contains() in ArrayList which loops over the entire list.

Sun has to address which your 2 solutions also don't.

-Surender

amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 3, 2007 10:05 PM
Reply to this message Reply
> Amin,
> Good that Sun has accepted the issue which is O(ns)
> performance. But that doesn't take away James' argument
> that your implementations too remain O(ns). Please see the
> contains() in ArrayList which loops over the entire list.

Go back and read this thread in more detail. The worst case performance of the linear removeAll implementation is O(n*K) where K is the complexity of the contains method on the passed collection, as a function of s (the number of items in the collection). Therefore, if the developer is being smart and passing a hashset, he'll achieve O(ns), whereas if he's not being smart and passes in an array list, he'll have O(ns) performance.

However, the worst case performance of Sun's removeAll implementation is O(n*n*K). For HashSet and ArrayList, you get O(n*n) and O(n*n*s) worst case performance.

As Mark Thorton pointed out to James, it's not possible to get a hard limit of O(n) performance by reallocated the elements to remove into a HashSet because you cannot guarantee correctness.

> Sun has to address which your 2 solutions also don't.

See my comments above: you clearly haven't understood the difference between Sun's approach and the O(n) approach.

> -Surender

amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 3, 2007 10:08 PM
Reply to this message Reply
> > Amin,
> > Good that Sun has accepted the issue which is O(ns)
> > performance. But that doesn't take away James' argument
> > that your implementations too remain O(ns). Please see
> the
> > contains() in ArrayList which loops over the entire
> list.
>
> Go back and read this thread in more detail. The worst
> case performance of the linear removeAll implementation is
> O(n*K) where K is the complexity of the contains method on
> the passed collection, as a function of s (the number of
> items in the collection). Therefore, if the developer is
> being smart and passing a hashset, he'll achieve O(ns),

err... make that O(n) [typing too fast]

> whereas if he's not being smart and passes in an array
> list, he'll have O(ns) performance.
>
> However, the worst case performance of Sun's removeAll
> implementation is O(n*n*K). For HashSet and ArrayList, you
> get O(n*n) and O(n*n*s) worst case performance.
>
> As Mark Thorton pointed out to James, it's not possible to
> get a hard limit of O(n) performance by reallocated the
> elements to remove into a HashSet because you cannot
> guarantee correctness.
>
> > Sun has to address which your 2 solutions also don't.
>
> See my comments above: you clearly haven't understood the
> difference between Sun's approach and the O(n) approach.
>
> > -Surender

Surender Kumar

Posts: 10
Nickname: darthvader
Registered: Jun, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 4, 2007 10:08 AM
Reply to this message Reply
Let me start with passing a list instead of a hashset.

Sun's impl:
if(contains()) remove()

which is O(2n)=O(n) rather than O(n*n) as your analysis says.
So your time estimates would for sure show the time differnce but not to the magnitude of n.

When you talk about deveopler's smartness of passing a Hashset the remove() in hashset would be O(1) [as would be contain()] if their was no chaining involved. So even Sun's impl passes that again.
Sun impl's inner loop suffers from 2n [contain() and remove()] and not n*n.

Hope you understand it.

-Surender

amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 4, 2007 1:57 PM
Reply to this message Reply
> Let me start with passing a list instead of a hashset.
>
> Sun's impl:
> if(contains()) remove()
>
> which is O(2n)=O(n) rather than O(n*n) as your analysis
> says.

No, the performance of the statement above is O(n+s). I never said the performance of the statement above is O(n*n), I was talking about removeAll as a whole.

> So your time estimates would for sure show the time
> differnce but not to the magnitude of n.
>
> When you talk about deveopler's smartness of passing a
> Hashset the remove() in hashset would be O(1) [as would
> be contain()] if their was no chaining involved. So even
> Sun's impl passes that again.
> Sun impl's inner loop suffers from 2n [contain() and
> remove()] and not n*n.

you lose me here.

>
> Hope you understand it.
>
> -Surender

Surender Kumar

Posts: 10
Nickname: darthvader
Registered: Jun, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 4, 2007 11:05 PM
Reply to this message Reply
> However, the worst case performance of Sun's removeAll
> implementation is O(n*n*K). For HashSet and ArrayList, you
> get O(n*n) and O(n*n*s) worst case performance.
>

OK if you agree that if(contains()) remove is O(n+s) and Sun's impl is iterating 'n' times how can it have the worst case O(n*n*K) or O(n*n*s) or even O(n^3) for an arrayList[s=n makes it the worst case]

All I'm saying is Sun's impl has remove() which is making the time twice instead of your interpretation of n times.

Sun's worst case is O(n^2) as is your solutions'.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 5, 2007 6:35 AM
Reply to this message Reply
> As Mark Thorton pointed out to James, it's not possible to
> get a hard limit of O(n) performance by reallocated the
> elements to remove into a HashSet because you cannot
> guarantee correctness.

Yes but you could do it if the class of the passed in collection was ArrayList or LinkedList and marker interfaces could make it general.

Surender Kumar

Posts: 10
Nickname: darthvader
Registered: Jun, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 6, 2007 2:39 AM
Reply to this message Reply
Removing to HashSet suffers from one drawback. Lists can have duplicate elements while HashSet cannot. So if input collection is a list with duplicate elements then you run into problems unless you tweak insertion/lookup into the set.

-Surender

Abraham Tehrani

Posts: 5
Nickname: atehrani
Registered: Mar, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 6, 2007 6:45 PM
Reply to this message Reply
One thing I would like to point out here is, although the current implemention is potentially slow, it is potentially safer. Because it internally uses an iterator, it can throw ConcurrentModificationException if the Collection is modified while you are iterating over it.

To quote the JavaDocs

"Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future."

I think the new implementation should take this into account.

amin ahmad

Posts: 15
Nickname: aahmad
Registered: Feb, 2007

Re: Optimizing ArrayList.removeAll() Posted: Mar 6, 2007 8:32 PM
Reply to this message Reply
> One thing I would like to point out here is, although the
> current implemention is potentially slow, it is
> potentially safer. Because it internally uses an iterator,
> it can throw ConcurrentModificationException if the
> Collection is modified while you are iterating over it.

That's a good point. In fact one thing I should point out is that a correct implementation of the removeAll method needs to increment the modCount variable to make sure that the fail-fast iterators work properly. This was an implementation detail I left out of the code samples I posted.

That said, ConcurrentModificationExceptions are probably overkill for individual method calls since, in this case, concurrent modification can only occur in a multi-threaded environment where you have not synchronized the collection properly (i.e. in single threaded code it is impossible). On the other hand, with iterators, it is quite possible for single threaded code to concurrently modify the collection while one or more iterators are "live".

>
> To quote the JavaDocs
>
> "Iterators that do this are known as fail-fast iterators,
> as they fail quickly and cleanly, rather that risking
> arbitrary, non-deterministic behavior at an undetermined
> time in the future."
>
> I think the new implementation should take this into
> account.

Again, when the arraylist is accessed by a single thread, operations that complete within a single method do not need to take this into account. In a multi-threaded environment, you would retrieve a synchronized version of the collection and use that instead.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Optimizing ArrayList.removeAll() Posted: Mar 7, 2007 10:42 AM
Reply to this message Reply
> Removing to HashSet suffers from one drawback. Lists can
> have duplicate elements while HashSet cannot. So if input
> collection is a list with duplicate elements then you run
> into problems unless you tweak insertion/lookup into the
> set.

This is not a problem. If A is equal to B and A is equal to C then C is equal to B per the transitivity and symmetry requirements of equals. In other words, I don't need duplicates to determine if something is in the list.

Flat View: This topic has 41 replies on 3 pages [ « | 1  2  3 | » ]
Topic: Working Effectively With Characterization Tests Previous Topic   Next Topic Topic: The Expressive Power of Languages

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use