Article Discussion
What's New in Scala 2.8: Collections API
Summary: In the first of a series of articles on the latest Scala release, Scala 2.8, Martin Odersky discusses the new collections API.
17 posts on 2 pages.      
« Previous 1 2 Next »
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: September 1, 2010 6:33 AM by
Bill
Posts: 409 / Nickname: bv / Registered: January 17, 2002 4:28 PM
What's New in Scala 2.8: Collections API
August 23, 2010 9:00 PM      
In the first of a series of articles on the latest Scala release, Scala 2.8, Martin Odersky discusses the new collections API.

http://www.artima.com/scalazine/articles/scala_collections.html

What do you think of Scala's new collections API?
Achilleas
Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
Re: What's New in Scala 2.8: Collections API
August 25, 2010 6:12 AM      
From a quick first look I had, it seems the new collections API offers a lot of collection types, I'd dare say almost all widely known collection types.

Without being a Scala user and only knowing the very basics of the language, I'd like to ask:

1) why the emphasis on immutable data structures? what are the benefits? aren't those slower than the mutable types?

2) when a collection's type is a primitive, does autoboxing happen? for example, an Array[int] is actually an Array[Integer]?
James
Posts: 128 / Nickname: watson / Registered: September 7, 2005 3:37 AM
Re: What's New in Scala 2.8: Collections API
August 26, 2010 6:38 AM      
> From a quick first look I had, it seems the new
> collections API offers a lot of collection types, I'd dare
> say almost all widely known collection types.
>
> Without being a Scala user and only knowing the very
> basics of the language, I'd like to ask:
>
> 1) why the emphasis on immutable data structures? what are
> the benefits? aren't those slower than the mutable types?

I assume you mean slower for writes. Theoretically there is an opportunity for the compiler to give a performance boost on reads for immutable collections.

As someone who is happy to see the emphasis on immutability, the benefits are safer and simpler code. I need not worry about whether successive reads will produce different results or whether the collection will change while I am iterating over it. This is especially the case when dealing with concurrency but not exclusively.

> 2) when a collection's type is a primitive, does
> autoboxing happen? for example, an Array[int] is actually
> an Array[Integer]?

Unlike Java, there is are no primitives in Scala but as I understand it, compiled Scala uses Java primitives. I've not taken the time to understand how this is done but it's pretty sexy.
Mark
Posts: 48 / Nickname: mthornton / Registered: October 16, 2005 11:22 PM
Re: What's New in Scala 2.8: Collections API
August 26, 2010 11:35 AM      
> 1) why the emphasis on immutable data structures? what are
> the benefits? aren't those slower than the mutable types?
You don't need to make defensive copies when passing them to methods (outside your control). This can save quite a lot of time in some circumstances.
Achilleas
Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 4:01 AM      
> Unlike Java, there is are no primitives in Scala but as I
> understand it, compiled Scala uses Java primitives. I've
> not taken the time to understand how this is done but it's
> pretty sexy.

Perhaps it sees that an object is an 'Integer' and JITs the relevant code to use 'int'.
Daniel
Posts: 3 / Nickname: 56653 / Registered: July 4, 2008 9:51 PM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 4:41 AM      
I found the new collections easy to use. There are a lot of new, interesting functions, and the resulting types were almost always "as expected".

The only function I couldn't find was the cartesian product of two collections, which would be nice to have.
Bill
Posts: 28 / Nickname: billpyne / Registered: January 29, 2007 4:12 AM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 5:32 AM      
> 1) why the emphasis on immutable data structures?

Parallel collections should be easier to implement with immutable structures.

http://www.scala-lang.org/node/7285
Achilleas
Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 7:31 AM      
> > 1) why the emphasis on immutable data structures?
>
> Parallel collections should be easier to implement with
> immutable structures.
>
> http://www.scala-lang.org/node/7285

Is the parallelization automatic?
Bill
Posts: 28 / Nickname: billpyne / Registered: January 29, 2007 4:12 AM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 7:59 AM      
> Is the parallelization automatic?

From the bits and pieces I read on the internet, the answer seems like a Yes. The following link has a paragraph near the bottom in which Martin Odersky describes use of "partition" across sequential collections and goes on to say, "Furthermore, the partition operation is quite fast, and will get even faster on parallel collections on multi-cores."

http://www.scala-lang.org/node/7364
sean
Posts: 1 / Nickname: seanm / Registered: July 7, 2009 6:33 PM
Re: What's New in Scala 2.8: Collections API
August 27, 2010 6:43 PM      
Immutable collections are usually not justified based on performance; they tend to have worse performance than the equivalent mutable collections. Use them for their functional elegance if that floats your boat, but don't use them if you think they are going to be faster.

Performance increases through parallelization and multi cores has yet to be seen. Immutability does confer some theoretical parallelization benefit, but in practice many other factors are at play it would be very difficult to turn that into an actual performance benefit.
Martin
Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
Re: What's New in Scala 2.8: Collections API
August 28, 2010 3:36 AM      
> From a quick first look I had, it seems the new
> collections API offers a lot of collection types, I'd dare
> say almost all widely known collection types.
>
> Without being a Scala user and only knowing the very
> basics of the language, I'd like to ask:
>
> 1) why the emphasis on immutable data structures? what are
> the benefits? aren't those slower than the mutable types?
>
The library is symmetric wrt mutability. Every major type has mutable as well as immutable implementations. Immutable ones are available directly whereas mutable ones need an explicit qualification or import.

The reason for preferring immutable is that immutable is safer. So in the spirit of avoiding premature optimizations
we made immutable the default.

Immutable collections do have persistent implementations which keep the cost overhead fairly low, and which even give an advantage over mutable for small collections (an immutable set or map of up to 4 elements takes only a single object; compare to a mutable map backed by a hashtable).

> 2) when a collection's type is a primitive, does
> autoboxing happen? for example, an Array[int] is actually
> an Array[Integer]?

For general collections, yes. But for arrays, no. Scala arrays are represented exactly the same as Java arrays. There's a section on arrays in the collections document.

The next versions of Scala will specialize further collection types iso that primitive types need not be boxed. That's an ongoing process which was started with the availability of @specialized annotation in Scala 2.8.
Martin
Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
Re: What's New in Scala 2.8: Collections API
August 28, 2010 3:40 AM      
> Is the parallelization automatic?

Yes. The following paper describes the upcoming parallel collections. More work is needed to validate performance, but the benchmark results contained in the paper are pretty promising so far.

http://infoscience.epfl.ch/record/150220
Martin
Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
Re: What's New in Scala 2.8: Collections API
August 28, 2010 3:45 AM      
> The only function I couldn't find was the cartesian
> product of two collections, which would be nice to have.

Cartesian product can be expressed by flatMap and map, but the easiest way is with a for expression:


for (x <- 1 to 10; y <- 1 to 10) yield (x, y)


The Scala compiler will translate this to


(1 to 10).flatMap(x => (1 to 10).map(y => (x, y)))
Morgan
Posts: 37 / Nickname: miata71 / Registered: March 29, 2006 6:09 AM
Re: What's New in Scala 2.8: Collections API
August 31, 2010 8:25 AM      
> Immutable collections are usually not justified based on
> performance; they tend to have worse performance than the
> equivalent mutable collections.

Sean - can you cite a source for this (or is this just personal experience)?

What you say seems plausible, but more data would be interesting. In my experience, I've never bothered timing collection performance because a few extra milliseconds spent on an insert pales in comparison to the seconds spent on analysis algorithms, and we have achieved major benefits from parallelization.
Mark
Posts: 48 / Nickname: mthornton / Registered: October 16, 2005 11:22 PM
Re: What's New in Scala 2.8: Collections API
August 31, 2010 10:54 AM      
I have timed immmutable/persistent collections against regular ones and found the immutable collections significantly slower for basic operations. However this isn't representative of how you would normally use the immutable collections. In particular it doesn't account for time saved by not having to copy a collection. You can thus only assess overall performance in the context of a specific real application.

In my work I often make a complex set of changes to a set of data and then assess whether the result is acceptable (feasible). If it isn't feasible, then the changes have to be reversed. If the data set as a whole is immutable, then that undo is achieved by simply keeping a reference to its previous state. With mutable data structures you either make a complete copy or actually perform the undo operations, in either case at significant cost in CPU and/or code complexity.
17 posts on 2 pages.
« Previous 1 2 Next »