The Artima Developer Community
Sponsored Link

Weblogs Forum
Is Scala Only for Computer Scientists?

17 replies on 2 pages. Most recent reply: May 18, 2017 8:02 PM by Frank Douglas

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 17 replies on 2 pages [ 1 2 | » ]
Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Is Scala Only for Computer Scientists? (View in Weblogs)
Posted: Jan 16, 2012 10:31 AM
Reply to this message Reply
Summary
I'm not talking about the early adopters writing obscure code here -- that can probably be solved with a suitable style guide. I just debugged my way through an example that should have been trivial but I only figured out because:
Advertisement
  1. I have experience struggling through these kinds of things and
  2. I know enough about the subject that I can understand why they did it that way.

But my concern is that this should be an example that a beginner could understand, and they can't. There's too much depth exposed.

Here's the example, which is written as a script:

import scala.io.Source._

case class Registrant(line: String) {
  val data = line.split(",")
  val first = data(0)
  val last = data(1)
  val email = data(2)
  val payment = data(3).toDouble
}

val data = """Bob,Dobbs,bob@dobbs.com,25.00
Rocket J.,Squirrel,rocky@frostbite.com,0.00
Bullwinkle,Moose,bull@frostbite.com,0.25
Vim,Wibner,vim32@goomail.com,25.00"""

val lines = fromString(data).getLines
//val lines = fromString(data).getLines.toIndexedSeq
val registrants = lines.map(Registrant)
registrants.foreach(println)
registrants.foreach(println)

The class Registrant takes a String as its constructor argument, and splits it up to produce the various data items stored within that object. Thus you can open a CSV (comma-separated value file, as is produced by most spreadsheets) and parse it into a collection of Registrant objects. You would ordinarily do this by reading in a file using fromFile instead of fromString, which is how I started before seeing weird behavior.

The "strange" behavior is this: as written, the program will only list the registrants once, instead of twice as requested. Indeed, you can't do anything else to your supposed collection of Registrant objects once they've been printed the first time.

Give up? The answer is that registrants is not a collection of any kind. Because getLines returns an iterator (which is the logical thing to do), any functional operation you perform on that iterator also produces an iterator, and you can only use an iterator to pass through your data once. This also makes sense ... after you understand the depth of what's going on, and realize that having "iterators all the way down" is good computer science.

But no posts I looked at that discussed reading files mentioned this, because I suspect the posters (A) didn't know and (B) didn't expect it to work that way, so assumed (logically) that things would work without doing anything else.

Here's the trick I discovered, although there certainly could be other, better ways to solve it. You have to know that you're getting back an iterator, and explicitly convert it to a regular sequence by calling toIndexedSeq as seen in the commented-out line.

This means that, to do something simple and useful that a beginner might find motivating -- like manipulating spreadsheet data from a file -- you'll probably have to explain to your beginner the difference between an iterator and a collection and why an iterator only passes through once, and that you must convert to something called an IndexedSeq. You can choose to wave your hands over the issue but I find that if you throw things at people without explaining them it tends to be confusing.

You can certainly argue that this is nice and consistent from a computer science standpoint and that the whole language maintains this consistency from top to bottom which makes it quite powerful. And that's great, but it means that before you can start doing useful things you need the kind of breadth and depth of knowledge that only a computer scientist has, and that makes Scala a harder sell as a first programming language (even though many aspects of Scala are significantly easier to grasp than Java or C++).

For a much more in-depth analysis of Scala complexity by someone with greater knowledge of Scala, see this well-written article. Please note that, just like the author of that article, I'm not saying that Scala is "bad" or "wrong" or things along those lines. I like Scala and think it's a powerful language that will allow us to do things that other languages won't. But in a previous article I suggested that Scala might be a good beginner's language, and "sharp edges" like this that are exposed in what would otherwise be beginning concepts make me wonder if that's true, or if it should actually be relegated to a second or even third language, after the learner has gone through the curve with one or two other languages. So the question is not whether I can figure out this puzzle, or whether it's obvious to you -- since you are probably an experienced programmer -- but rather how much more difficult it might be to teach Scala to an inexperienced programmer.


Franklin Chen

Posts: 7
Nickname: fmchen
Registered: Aug, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 16, 2012 10:56 AM
Reply to this message Reply
I don't think it's fair to blame Scala for the confusion between an iterator and a collection; many other languages also have iterators. For the record, as soon as I saw the code posted, I immediately understood what was going on, because I have personally confused myself in Python also when it comes to iterators and collection. Last year I accidentally "optimized" some code in one of my Python programs, by globally replacing list comprehensions with generator comprehensions; the resulting code broke because something really needed to be a list in order to be iterated over twice from different places in the program. I don't blame Python for my error.

In Scala, there is even less of a reason to get the two confused, because it is statically typed: for clarity, one can forego some type inference and explicitly write out the type, to alert the code reader to what is going on:

val result : Iterator[String] = fromString(data).getLines

Eric Pederson

Posts: 12
Nickname: 71552
Registered: Apr, 2010

Re: Is Scala Only for Computer Scientists? Posted: Jan 16, 2012 6:07 PM
Reply to this message Reply
Looking at the API docs for Source.getLines:

def getLines (): Iterator[String]
Returns an iterator who returns lines (NOT including newline character(s)).

I'm not sure where the confusion comes from. If I'm using an API in whatever language I read the API docs to see what methods return.

Besides .toIndexedSeq you can also use .toList, etc. on an Iterator. IndexedSeq is only needed of you need O(1) random access to elements.

Michael Chermside

Posts: 17
Nickname: mcherm
Registered: Jan, 2004

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 5:35 AM
Reply to this message Reply
Bruce: I have seen quite a few examples of "Scala is confusing to non-experts", and it may well be the case, but I don't feel that this example is a strong argument in its favor. I suppose I just don't find this behavior to be all that surprising. Would I make a mistake like this? Absolutely -- I have done similar things many times in the past! If the second use were far removed from the first use then perhaps I would get confused and take some time figuring out what was wrong. But as you presented it, the problem becomes immediately obvious... you can iterate the list once but not twice. It will be less obvious WHY that is... especially if I don't know much about iterators, but "stick it in a list" is the obvious solution, and just how to do that isn't too difficult to figure out.

Stuart Halloway

Posts: 1
Nickname: 56205
Registered: Jun, 2008

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 5:57 AM
Reply to this message Reply
I don't think this is about iterators vs. collections. It is about mutability. Iteration constructs don't have to be mutable (and should not be, by default). E.g. in Clojure:

https://gist.github.com/1626694

I don't know the Scala lib well but I am sure it has something similar.

Dhananjay Nene

Posts: 132
Nickname: dnene
Registered: Jan, 2008

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 6:41 AM
Reply to this message Reply
Here's virtually identical code in python with exactly the same effects. Hardly means python is for computer scientists

https://gist.github.com/1626867

Franklin Chen

Posts: 7
Nickname: fmchen
Registered: Aug, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 6:46 AM
Reply to this message Reply
Stuart, I agree with you that stateless iteration is better, but that is a whole topic in itself. The Haskell world is very active in that general area.

Scala has gone the Java way with its iterators (that in turn were inspired by those in C++ STL). Hence, the very interface is fundamentally designed for a mutable external iterator that is really a "cursor": http://www.scala-lang.org/api/current/scala/collection/Iterator.html

I am still a novice at Scala, but I presume there are more functional libraries for Scala that bypass the Java legacy interface.

Franklin Chen

Posts: 7
Nickname: fmchen
Registered: Aug, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 6:47 AM
Reply to this message Reply
Exactly what I was referring to in my first post.

Martin Odersky

Posts: 84
Nickname: modersky
Registered: Sep, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 10:06 AM
Reply to this message Reply
Bruce,

As others have noted this example behaves exactly the same in Python, which is generally perceived to be a good beginners language.

Small note on code style: Instead of toIndexedSeq, it's shorter to use just toSeq. You do not need random access of your sequence, so specifying toIndexedSeq is overkill.

John Atwood

Posts: 2
Nickname: johndoekyr
Registered: Jul, 2009

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 11:19 AM
Reply to this message Reply
Interesting, the equivalent program in F#, and probably also C# will iterate through the registrants twice, as expected.

open System

type Registrant = { Data : string; First : string; Last : string; Email : string; Payment : decimal }

let loadRegistrant (data : string) =
let lines = data.Split([|','|])
{ Data = data
First = lines.[0]
Last = lines.[1]
Email = lines.[2]
Payment = Decimal.Parse(lines.[3]) }

let data = @"Bob,Dobbs,bob@dobbs.com,25.00
Rocket J.,Squirrel,rocky@frostbite.com,0.00
Bullwinkle,Moose,bull@frostbite.com,0.25
V im,Wibner,vim32@goomail.com,25.00"

let lines = data.Split([|'\n'|])

let registrants =
lines
|> Seq.map loadRegistrant

registrants |> Seq.iter (printfn "%O")
registrants |> Seq.iter (printfn "%O")


The main difference between the F# and scala versions is that Seq.map returns an IEnumerable, which would roughly be equivalent to the Iterable trait in scala. Both Iterable and IEnumerable produce a cursor like object that iterates once through the collection of objects.

It is interesting that F# implements higher order functions like map, and fold only on the IEnumerable (scala Iterable). Scala implements these functions on both Iterable, and Iterator, which is causing the confusion in this particular program.

Using higher order functions on Iterable (F# IEnumerable) prevents you from attempting to use iterators that are already consumed. However, the second time you iterate over the sequence, an entirely new collection of objects is generated. If this is not your intention, you will need to cache the first sequence using something like Seq.cache, or Seq.toList

Cay Horstmann

Posts: 13
Nickname: cay
Registered: Apr, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 5:54 PM
Reply to this message Reply
There aren't all that many Scala methods that return an iterator; the ones that do do so for a good reason. For example, getLines doesn't know if the source (usually a file) is ten million lines long. Or look at combinations, permutations, groups, or sliding in Seq.

Alternatively, they could have returned lazy collections. That might have been a little nicer.

Anyway, this is a mistake that everyone makes once, and I don't think it is generally hard to understand, particularly if your source was a file, not a string. Soon afterwards your fingers learn to type getLines.toArray (or, if you are a computer scientist, toSeq :-)).

BTW, why use a Source in the first place? What's wrong with

data.split('\n')

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Is Scala Only for Computer Scientists? Posted: Jan 17, 2012 7:34 PM
Reply to this message Reply
> BTW, why use a Source in the first place? What's wrong
> with
>
> data.split('\n')

Originally I was reading from a file. It was easier to explain the problem without using the external file.

Jed Wesley-Smith

Posts: 1
Nickname: jedws
Registered: Jan, 2012

Re: Is Scala Only for Computer Scientists? Posted: Jan 22, 2012 4:25 AM
Reply to this message Reply
I am a bit late to this, but it is worth noting that the usual way in Scala to get a lazy immutable collection (a Stream) is to call toStream on the Iterator. Iterators and Streams are kind of the mutable/immutable equivalents and the conversion between them is trivial.

Jiri Goddard

Posts: 54
Nickname: goddard
Registered: May, 2007

Re: Is Scala Only for Computer Scientists? Posted: Feb 22, 2012 2:35 AM
Reply to this message Reply
There're not many dumb-proof programming languages and not everybody can or should write programs. But if you want to and you get stuck, you can ask and possibly learn.

Miguel Monteiro

Posts: 1
Nickname: mmonteiro
Registered: Oct, 2012

Re: Is Scala Only for Computer Scientists? Posted: Oct 25, 2012 11:12 AM
Reply to this message Reply
Report from a Scala newbie
Talk about a mistake that everyone makes once. I've just made it - and then found this page after a few searches.
I was trying the following snippet, never dreaming for a moment that toList does NOT return a list.

object ExpListLines {
def main(args: Array[String]) = {
val lines = io.Source.fromFile("some txt file").getLines
// println(lines.length)
for(l <- lines)
println("[" + l + "]")
}
}
I was rather surprised to learn that the lines didn't show up in the output - until I remembered Eckel's post I read the day before and commented the call to length.

It may be true that this mistake is made only once, but I also feel that everyone makes it.

Flat View: This topic has 17 replies on 2 pages [ 1  2 | » ]
Topic: Code versus Software Previous Topic   Next Topic Topic: The Principles of Good Programming

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use