Article Discussion
What's New in Scala 2.8: Collections API
Summary: In the first of a series of articles on the latest Scala release, Scala 2.8, Martin Odersky discusses the new collections API.
16 posts.
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: September 1, 2010 6:33 AM by michael
    Bill
     
    Posts: 409 / Nickname: bv / Registered: January 17, 2002 4:28 PM
    What's New in Scala 2.8: Collections API
    August 23, 2010 9:00 PM      
    In the first of a series of articles on the latest Scala release, Scala 2.8, Martin Odersky discusses the new collections API.

    http://www.artima.com/scalazine/articles/scala_collections.html

    What do you think of Scala's new collections API?
    • Daniel
       
      Posts: 3 / Nickname: 56653 / Registered: July 4, 2008 9:51 PM
      Re: What's New in Scala 2.8: Collections API
      August 27, 2010 4:41 AM      
      I found the new collections easy to use. There are a lot of new, interesting functions, and the resulting types were almost always "as expected".

      The only function I couldn't find was the cartesian product of two collections, which would be nice to have.
      • Martin
         
        Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
        Re: What's New in Scala 2.8: Collections API
        August 28, 2010 3:45 AM      
        > The only function I couldn't find was the cartesian
        > product of two collections, which would be nice to have.

        Cartesian product can be expressed by flatMap and map, but the easiest way is with a for expression:


        for (x <- 1 to 10; y <- 1 to 10) yield (x, y)


        The Scala compiler will translate this to


        (1 to 10).flatMap(x => (1 to 10).map(y => (x, y)))
    • Achilleas
       
      Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
      Re: What's New in Scala 2.8: Collections API
      August 25, 2010 6:12 AM      
      From a quick first look I had, it seems the new collections API offers a lot of collection types, I'd dare say almost all widely known collection types.

      Without being a Scala user and only knowing the very basics of the language, I'd like to ask:

      1) why the emphasis on immutable data structures? what are the benefits? aren't those slower than the mutable types?

      2) when a collection's type is a primitive, does autoboxing happen? for example, an Array[int] is actually an Array[Integer]?
      • Mark
         
        Posts: 48 / Nickname: mthornton / Registered: October 16, 2005 11:22 PM
        Re: What's New in Scala 2.8: Collections API
        August 26, 2010 11:35 AM      
        > 1) why the emphasis on immutable data structures? what are
        > the benefits? aren't those slower than the mutable types?
        You don't need to make defensive copies when passing them to methods (outside your control). This can save quite a lot of time in some circumstances.
      • James
         
        Posts: 128 / Nickname: watson / Registered: September 7, 2005 3:37 AM
        Re: What's New in Scala 2.8: Collections API
        August 26, 2010 6:38 AM      
        > From a quick first look I had, it seems the new
        > collections API offers a lot of collection types, I'd dare
        > say almost all widely known collection types.
        >
        > Without being a Scala user and only knowing the very
        > basics of the language, I'd like to ask:
        >
        > 1) why the emphasis on immutable data structures? what are
        > the benefits? aren't those slower than the mutable types?

        I assume you mean slower for writes. Theoretically there is an opportunity for the compiler to give a performance boost on reads for immutable collections.

        As someone who is happy to see the emphasis on immutability, the benefits are safer and simpler code. I need not worry about whether successive reads will produce different results or whether the collection will change while I am iterating over it. This is especially the case when dealing with concurrency but not exclusively.

        > 2) when a collection's type is a primitive, does
        > autoboxing happen? for example, an Array[int] is actually
        > an Array[Integer]?

        Unlike Java, there is are no primitives in Scala but as I understand it, compiled Scala uses Java primitives. I've not taken the time to understand how this is done but it's pretty sexy.
        • Achilleas
           
          Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
          Re: What's New in Scala 2.8: Collections API
          August 27, 2010 4:01 AM      
          > Unlike Java, there is are no primitives in Scala but as I
          > understand it, compiled Scala uses Java primitives. I've
          > not taken the time to understand how this is done but it's
          > pretty sexy.

          Perhaps it sees that an object is an 'Integer' and JITs the relevant code to use 'int'.
      • Martin
         
        Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
        Re: What's New in Scala 2.8: Collections API
        August 28, 2010 3:36 AM      
        > From a quick first look I had, it seems the new
        > collections API offers a lot of collection types, I'd dare
        > say almost all widely known collection types.
        >
        > Without being a Scala user and only knowing the very
        > basics of the language, I'd like to ask:
        >
        > 1) why the emphasis on immutable data structures? what are
        > the benefits? aren't those slower than the mutable types?
        >
        The library is symmetric wrt mutability. Every major type has mutable as well as immutable implementations. Immutable ones are available directly whereas mutable ones need an explicit qualification or import.

        The reason for preferring immutable is that immutable is safer. So in the spirit of avoiding premature optimizations
        we made immutable the default.

        Immutable collections do have persistent implementations which keep the cost overhead fairly low, and which even give an advantage over mutable for small collections (an immutable set or map of up to 4 elements takes only a single object; compare to a mutable map backed by a hashtable).

        > 2) when a collection's type is a primitive, does
        > autoboxing happen? for example, an Array[int] is actually
        > an Array[Integer]?

        For general collections, yes. But for arrays, no. Scala arrays are represented exactly the same as Java arrays. There's a section on arrays in the collections document.

        The next versions of Scala will specialize further collection types iso that primitive types need not be boxed. That's an ongoing process which was started with the availability of @specialized annotation in Scala 2.8.
      • Bill
         
        Posts: 28 / Nickname: billpyne / Registered: January 29, 2007 4:12 AM
        Re: What's New in Scala 2.8: Collections API
        August 27, 2010 5:32 AM      
        > 1) why the emphasis on immutable data structures?

        Parallel collections should be easier to implement with immutable structures.

        http://www.scala-lang.org/node/7285
        • Achilleas
           
          Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
          Re: What's New in Scala 2.8: Collections API
          August 27, 2010 7:31 AM      
          > > 1) why the emphasis on immutable data structures?
          >
          > Parallel collections should be easier to implement with
          > immutable structures.
          >
          > http://www.scala-lang.org/node/7285

          Is the parallelization automatic?
          • Martin
             
            Posts: 15 / Nickname: modersky / Registered: September 14, 2003 9:46 PM
            Re: What's New in Scala 2.8: Collections API
            August 28, 2010 3:40 AM      
            > Is the parallelization automatic?

            Yes. The following paper describes the upcoming parallel collections. More work is needed to validate performance, but the benchmark results contained in the paper are pretty promising so far.

            http://infoscience.epfl.ch/record/150220
          • Bill
             
            Posts: 28 / Nickname: billpyne / Registered: January 29, 2007 4:12 AM
            Re: What's New in Scala 2.8: Collections API
            August 27, 2010 7:59 AM      
            > Is the parallelization automatic?

            From the bits and pieces I read on the internet, the answer seems like a Yes. The following link has a paragraph near the bottom in which Martin Odersky describes use of "partition" across sequential collections and goes on to say, "Furthermore, the partition operation is quite fast, and will get even faster on parallel collections on multi-cores."

            http://www.scala-lang.org/node/7364
            • sean
               
              Posts: 1 / Nickname: seanm / Registered: July 7, 2009 6:33 PM
              Re: What's New in Scala 2.8: Collections API
              August 27, 2010 6:43 PM      
              Immutable collections are usually not justified based on performance; they tend to have worse performance than the equivalent mutable collections. Use them for their functional elegance if that floats your boat, but don't use them if you think they are going to be faster.

              Performance increases through parallelization and multi cores has yet to be seen. Immutability does confer some theoretical parallelization benefit, but in practice many other factors are at play it would be very difficult to turn that into an actual performance benefit.
              • Morgan
                 
                Posts: 37 / Nickname: miata71 / Registered: March 29, 2006 6:09 AM
                Re: What's New in Scala 2.8: Collections API
                August 31, 2010 8:25 AM      
                > Immutable collections are usually not justified based on
                > performance; they tend to have worse performance than the
                > equivalent mutable collections.

                Sean - can you cite a source for this (or is this just personal experience)?

                What you say seems plausible, but more data would be interesting. In my experience, I've never bothered timing collection performance because a few extra milliseconds spent on an insert pales in comparison to the seconds spent on analysis algorithms, and we have achieved major benefits from parallelization.
                • Mark
                   
                  Posts: 48 / Nickname: mthornton / Registered: October 16, 2005 11:22 PM
                  Re: What's New in Scala 2.8: Collections API
                  August 31, 2010 10:54 AM      
                  I have timed immmutable/persistent collections against regular ones and found the immutable collections significantly slower for basic operations. However this isn't representative of how you would normally use the immutable collections. In particular it doesn't account for time saved by not having to copy a collection. You can thus only assess overall performance in the context of a specific real application.

                  In my work I often make a complex set of changes to a set of data and then assess whether the result is acceptable (feasible). If it isn't feasible, then the changes have to be reversed. If the data set as a whole is immutable, then that undo is achieved by simply keeping a reference to its previous state. With mutable data structures you either make a complete copy or actually perform the undo operations, in either case at significant cost in CPU and/or code complexity.
                  • Achilleas
                     
                    Posts: 98 / Nickname: achilleas / Registered: February 3, 2005 2:57 AM
                    Re: What's New in Scala 2.8: Collections API
                    September 1, 2010 4:04 AM      
                    > I have timed immmutable/persistent collections against
                    > regular ones and found the immutable collections
                    > significantly slower for basic operations. However this
                    > isn't representative of how you would normally use the
                    > immutable collections. In particular it doesn't account
                    > for time saved by not having to copy a collection. You can
                    > thus only assess overall performance in the context of a
                    > specific real application.
                    >
                    > In my work I often make a complex set of changes to a set
                    > of data and then assess whether the result is acceptable
                    > (feasible). If it isn't feasible, then the changes have to
                    > be reversed. If the data set as a whole is immutable, then
                    > that undo is achieved by simply keeping a reference to its
                    > previous state. With mutable data structures you either
                    > make a complete copy or actually perform the undo
                    > operations, in either case at significant cost in CPU
                    > and/or code complexity.

                    As you say, it all depends on the application. If the data change a lot, then it makes sense to modify a copy of them. If only a small subset of the data changes, it's better for performance reasons to just keep the changes and change the original data (assuming we are talking about data sizes that can visibly affect performance).
    • michael
       
      Posts: 2 / Nickname: lawley / Registered: December 29, 2007 9:56 PM
      Re: What's New in Scala 2.8: Collections API
      September 1, 2010 6:33 AM      
      It's really nice to see the tables documenting performance characteristics on the linked page:
      http://lampwww.epfl.ch/~odersky/whatsnew/collections-api/collections_40.html

      But...my application is very sensitive to memory consumption; many kinds of Maps and Map-like structures, some dense, some sparse. The Java Collection classes are extremely memory hungry. It would be really nice to see the memory-performance of the various implementations documented too.

      michael