![]() |
Sponsored Link •
|
Advertisement
|
Summary
Martin Odersky talks with Bill Venners and Frank Sommers about the design motivations behind Scala's type system.
Scala is an emerging general-purpose, type-safe language for the Java Platform that combines object-oriented and functional programming. It is the brainchild of Martin Odersky, a professor at Ecole Polytechnique Fédérale de Lausanne (EPFL). In this multi-part interview series, Artima's Bill Venners and Frank Sommers discuss Scala with Martin Odersky. In Part I, The Origins of Scala, Odersky gives a bit of the history that led to the creation of Scala. In Part II, The Goals of Scala, he discusses the compromises, goals, innovations, and benefits of Scala's design. In this installment, he dives into the design motivations for Scala's type system.
Frank Sommers: In your talk at last year's JavaOne, you claimed the Scala is a "scalable language," that you can program in the small and program in the large with Scala. How does this help me as a programmer to use a language like this?
Martin Odersky: The way it helps you is by not having to mix many specialized languages. You can use the same language for very small as well as very large programs, for general purpose things as well as special purpose application domains. And that means that you need not worry about how you push data from one language environment into the next.
Currently if you want to push data across boundaries, you are often thrown back to low level representations. For instance, if you want to ship an SQL query from Java to a database and you use JDBC, your query ends up as a string. And that means that a small typo in your program will manifest itself as an ill-formed query at runtime, possibly at your customer site. There's no compiler or type system to tell you that you shouldn't have done that. This is very fragile and risky. So there's a lot to be gained if you have a single language.
The other issue is tooling. If you're using a single language, you can have a single environment with tooling. Whereas if you have many different languages, you have to mix and match environments and your build become much more complicated and difficult.
Frank Sommers: You also mentioned in your talk the notion of extensibility, that Scala can be extended easily. Can you explain how? And again, how does that help the programmer?
Martin Odersky: The first dimension of scalability is from small to large, but I think there's another notion of extensibility from general to your specific needs. You want to be able to grow the language into domains that you particularly care about.
One example is numeric types. There are a lot of special numeric types out there—for instance, big integers for cryptographers, big decimals for business people, complex numbers for scientists—the list goes on. And each of these communities really cares deeply about their type, but a language that combined them all would be very unwieldy.
The answer, of course, is to say, well, let's do these types
in libraries. But then if you really care about this application
domain, you want the code accessing these libraries to look just
as clean and sleek as code accessing built-in types. For that you
need extensibility mechanisms in the language that let you write
libraries such that users of those libraries don't even feel that
it is a library. For users of a library, let's say a big decimal
library, the BigDecimal
type should be just as
convenient to use as a built-in Int
.
Frank Sommers: You mentioned earlier the importance of types in the context of having one language instead of many. I think most people appreciate the utility of types when programming in the large. When you have a very large-scale program, types help you organize the program and make changes to it reliably. But what do types buy us in terms of programming in the small, when you program just a script, for example? Are types important on that level as well?
Martin Odersky: They are probably less important when programming in the small. Types can be in a spectrum from incredibly useful to extremely annoying. Typically the annoying parts are type definitions that are redundant, which require you to do a lot of (finger) typing. The useful parts are, of course, when types save you from errors, when types give you useful program documentation, when types act as a safety net for safe refactoring.
Scala has type inference to try and let you minimize the annoying bits as much as possible. That means if you write a script, you don't see the types. Because you can just leave them off and the system will infer them for you. At the same time, the types are there so if you make a type error in the script, the compiler will catch it and give you an error message. And I believe no matter whether it is a script or a large system, it's always more convenient to fix this thing immediately with the compiler than later on.You still need unit tests to test your program logic, but compared to a dynamically typed language, you don't need a lot of the more trivial unit tests that may be just about the types. In the experience of many people, you need a lot fewer unit tests than you would in a dynamic language. Your mileage might vary, but that's been our experience in several cases.
The other objection that's been leveled against static type systems is that they constrain you too much in what you want to express. People say, "I want to express myself freely. I don't want a static type system getting in the way." In my experience in Scala this has not been true, I think for two reasons. The first is that the type system in Scala is actually very flexible, so it typically lets you compose things in very flexible patterns, which a language like Java, which has a less expressive type system, would often make more difficult. The second is that with pattern matching, you can recover type information in a very flexible way without even noticing it.
The idea of pattern matching is that in Scala I can take an object about which I know nothing, and then with a construct like a switch statement, match it against a number of patterns. And if it is one of these patterns, I can also immediately pull out the fields into local variables. Pattern matching is a construct that's built deep into Scala. A lot of Scala programs use it. It is a normal way to do things in Scala. One interesting thing is that by doing a pattern match you also recover the types automatically. What you put in was an object, which you didn't know anything about. If a pattern matches, you actually know you have something that corresponds to the type of the pattern. And the system is able to use that.
Because of pattern matching, you can quite easily have a
system where your types are very general, even maximally
general—like the type of every variable is
Object
—but you can still get everything out that you
want through the use of pattern matching. So in that sense, you
can program in Scala perfectly well as if it were a dynamically
typed language. You would just use Object
everywhere
and pattern match everywhere. Now people usually don't do that,
because you want to take more advantage of static types. But it
is a very fluid fallback, a fallback that you don't even notice.
By comparison, the analog in Java where you would have to use a
lot of type tests (instanceof
) and type casts is
really heavyweight and clunky. And I completely understand why
people object to having to do that all over the place.
Bill Venners: One of the things I have observed about Scala is that there are a lot more things I can express or say about my program in Scala's type system compared to Java's. People fleeing Java to a dynamic language often explain that they were frustrated with the type system and found they have a better experience if they throw out static types. Whereas it seems like Scala's answer is to try and make the type system better, to improve it so it is more useful and more pleasant to use. What kind of things can I say in Scala's type system that I can't in Java's?
Martin Odersky: One objection leveled against
Java's type system is that it doesn't have what's often called
duck typing. Duck typing is explained as, if it walks like a duck
and quacks like a duck, it is a duck. Translated, if it has the
features that I want, then I can just treat it as if it is the
real thing. For instance, I want to get a resource that is
closable. I want to say, "It needs to have a close
method." I don't care whether it's a File
or a
Channel
or anything else.
In Java, for this to work you need a common interface that contains the method, and everybody needs to implement that interface. First, that leads to a lot of interfaces and a lot of boilerplate code to implement all that. And second, it is often impossible to do if you think of this interface after the fact. If you write the classes first and the classes exist already, you can't add a new interface later on without breaking source code unless you control all the clients. So you have all these restrictions that the types force upon you.
One of the aspects where Scala is more expressive than Java is
that it lets you express these things. In Scala it is possible to
have a type that says: anything with a close
method
that takes no parameter and returns Unit
(which is
similar to void
in Java). You can also combine it
with other constraints. You can say: anything inheriting from a
particular class that in addition has these particular methods
with these signatures. Or you can say: anything inheriting from
this class that has an inner class of a particular type.
Essentially, you can characterize types structurally by
saying what needs to be in the types so that you can work with
them.
Bill Venners: Existential types were added to Scala relatively recently. The justification I heard for existentential types was that they allow you to map all Java types, in particular Java's wildcard types, to Scala types. Are existential types larger than that? Are they a superset of Java's wildcard types? And is there any other reason for them that people should know about?
Martin Odersky: It is hard to say because people don't really have a good conception of what wildcards are. The original wildcard design by Atsushi Igarashi and Mirko Viroli was inspired by existential types. In fact the original paper had an encoding in existential types. But then when the actual final design came out in Java, this connection got lost a little bit. So we don't really know the status of these wildcard types right now.
Existential types have been around for a number of years,
about 20 years now. They express something very simple. They say
you have a type, maybe a list, with an element type that you
don't know. You know it's a list of some specific element type,
but you don't know the element type. In Scala that would be
expressed with an existential type. The syntax would be
List[T] forSome { type T }
. That's a bit bulky. The
bulky syntax is in fact sort of intentional, because it turns out
that existential types are often a bit hard to deal with. Scala
has better alternatives. It doesn't need existential types so
much, because we can have types that contain other types as
members.
Scala needs existential types for essentially three things.
The first is that we need to make some sense of Java's wildcards,
and existential types is the sense we make of them. The second is
that we need to make some sense of Java's raw types, because they
are also still in the libraries, the ungenerified types. If you
get a Java raw type, such as java.util.List
it is a
list where you don't know the element type. That can also be
represented in Scala by an existential type. Finally, we need
existential types as a way to explain what goes on in the VM at
the high level of Scala. Scala uses the erasure model of
generics, just like Java, so we don't see the type parameters
anymore when programs are run. We have to do erasure because we
need to interoperate with Java. But then what happens when we do
reflection or want to express what goes on the in the VM? We need
to be able to represent what the JVM does using the types we have
in Scala, and existential types let us do that. They let you talk
about types where you don't know certain aspects of those
types.
Bill Venners: Can you give a specific example?
Martin Odersky: Take Scala lists as an
example. I want to be able to describe the return type of the
method, head
, which returns the first element (the
"head") of the list. On the VM level, it is a List[T]
forSome { type T }
. We don't know what T
is,
but head
returns a T
. The theory of
existential types tells us that is a T
for some type
T, which is equivalent to the root type, Object
. So
we get this back from the head
method. Thus in
Scala, when we know something we can eliminate these existential
qualifications. When we don't know something, we leave them in,
and the theory of existential types helps us there.
Bill Venners: Would you have added existential types if you didn't need to worry about the Java compatibility concerns of wildcards, raw types, and erasure. If Java had reified types and no raw types or wildcards, would Scala have existential types?
Martin Odersky: If Java had reified types and no raw types or wildcards, I don't think we would have that much use for existential types and I doubt they would be in Scala.
Bill Venners: In Scala variance is defined at the point the class is defined whereas in Java it's done at the usage sites with wildcards. Can you talk about that difference?
Martin Odersky: Because we can model
wildcards in Scala with existential types, you actually can if
you want do the same thing as in Java. But we encourage you to
not do that and use definition site variance instead. Why? First,
what is definition-site variance? When you define a class with a
type parameter, for instance List[T]
, that raises a
question. If you have a list of apples, is that also a list of
fruit? You would say, yes, of course. If Apple
is a
subtype of Fruit
, List[Apple]
should be
a subtype of List[Fruit]
. That subtyping
relationship is called covariance. But in some cases,
that relationship doesn't hold. If I have, say, a variable in
which I can put only an Apple
, a reference of type
Apple
. That's not a reference of type
Fruit
because I can't just assign any
Fruit
to this variable. It has to be an
Apple
. So you can see there are some situations
where we should have the subtype relationship, and others where
we shouldn't.
The solution in Scala is we annotate the type parameter. If
List
is covariant in T
, we would write
List[+T]
. That would mean List
s are
covariant in T
. There are certain conditions that
are attached to that. For instance, we can do that only if nobody
changes the list, because otherwise we would get into the same
problems that we had with the references.
T
with a plus sign at the declaration site—only once
for all List
s that anybody ever uses. Then the
compiler will go and figure out whether all the definitions
within List
are actually compatible with that, that
there's nothing being done with lists that would be conflicting
there. If there is something that's incompatible with covariance,
the Scala compiler will issue an error. Scala has a range of
techniques to deal with those errors, which a competent Scala
programmer will pick up fairly quickly. A competent Scala
programmer can apply those techniques and end up with a class
that compiles and is covariant for the users. Users don't have to
think about it anymore. They know if I have a list I can just use
it covariantly everywhere. So that means there was just one
person who wrote the list class, who had to think a little bit
harder, and it was not so bad because the compiler helped this
person with error messages.
By contrast, the Java approach of having wildcards means that
in the library you do nothing. You just write
List<T>
and that's it. And then if a user
wants a covariant list, they write not
List<Fruit>
, but List<? extends
Fruit>
. So that's a wildcard. The problem is that
that's user code. And these are users who are often not as expert
as the library designers. Furthermore, a single mismatch between
these annotations gives you type errors. So no wonder you can get
a huge number of completely intractable error messages related to
wildcards, and I think that this more than anything else has
given the current Java generics a bad rap. Because really this
wildcard approach is quite complicated for normal humans to grasp
and deal with.
Variance is something that is essential when you combine generics and subtyping, but it's also complex. There's no way to make this completely trivial. The thing we do better than Java is that we let you do it once in the libraries, so that the users don't have to see or deal with it.
Bill Venners: In Scala, a type can be a member of another type, just as methods and fields can be members of a type. And in Scala those type members can be abstract, like methods can be abstract in Java. Is there not some overlap between abstract type members and generic type parameters? Why does Scala have both? And what do abstract types give you beyond what the generics give you?
Martin Odersky: Abstract types give you some things beyond what generics give you, but let me first state a somewhat general principle. There have always been two notions of abstraction: parameterization and abstract members. In Java you also have both, but it depends on what you are abstracting over. In Java you have abstract methods, but you can't pass a method as a parameter. You don't have abstract fields, but you can pass a value as a parameter. And similarly you don't have abstract type members, but you can specify a type as a parameter. So in Java you also have all three of these, but there's a distinction about what abstraction principle you can use for what kinds of things. And you could argue that this distinction is fairly arbitrary.
What we did in Scala was try to be more complete and orthogonal. We decided to have the same construction principles for all three sorts of members. So you can have abstract fields as well as value parameters. You can pass methods (or "functions") as parameters, or you can abstract over them. You can specify types as parameters, or you can abstract over them. And what we get conceptually is that we can model one in terms of the other. At least in principle, we can express every sort of parameterization as a form of object-oriented abstraction. So in a sense you could say Scala is a more orthogonal and complete language.
Now the question remains, what does that buy you? What, in
particular, abstract types buy you is a nice treatment for these
covariance problems we talked about before. One standard problem,
which has been around for a long time, is the problem of animals
and foods. The puzzle was to have a class Animal
with a method, eat
, which eats some food. The
problem is if we subclass Animal
and have a class
such as Cow
, then they would eat only
Grass
and not arbitrary food. A Cow
couldn't eat a Fish
, for instance. What you want is
to be able to say that a Cow
has an eat
method that eats only Grass
and not other things.
Actually, you can't do that in Java because it turns out you can
construct unsound situations, like the problem of assigning a
Fruit
to an Apple
variable that I
talked about earlier.
The question is what do you do? The answer is that you add an
abstract type into the Animal
class. You say, my new
Animal
class has a type of
SuitableFood
, which I don't know. So it's an
abstract type. You don't give an implementation of the type. Then
you have an eat
method that eats only
SuitableFood
. And then in the Cow
class
I would say, OK, I have a Cow
, which extends class
Animal
, and for Cow
type
SuitableFood
equals Grass
. So abstract
types provide this notion of a type in a superclass that I don't
know, which I then fill in later in subclasses with something I
do know.
Now you could say, well I could do the same thing with
parameterization. And indeed you can. You could parameterize
class Animal
with the kind of food it eats. But in
practice, when you do that with many different things, it leads
to an explosion of parameters, and usually, what's more, in
bounds of parameters. At the 1998 ECOOP, Kim Bruce, Phil Wadler,
and I had a paper where we showed that as you increase the number
of things you don't know, the typical program will grow
quadratically. So there are very good reasons not to do
parameters, but to have these abstract members, because they
don't give you this quadratic blow up.
Bill Venners: When people look at random Scala code, there are two things that I think can make it look a bit cryptic. One is a DSL they are not familiar with, like the parser combinators or the XML library. The other is the kinds of expressions in the type system, especially combinations of things. How can Scala programmers get a handle on that kind of syntax?
Martin Odersky: Certainly there's a lot of new stuff there that has to be learned and absorbed. So this will take some time. I believe one of the things we have to work on is better tool support. Right now when you get a type error, we try to give you a nice error message. Sometimes it spans multiple lines to be able to explain more. We try to do a good job, but I think we could do much better if we could be more interactive.
Imagine if you had a dynamically typed language and you only had three or four lines max for an error message when something went wrong at runtime. There would be no debugger. There would be no stack trace. There would be just three or four lines, such as "null pointer dereference," and maybe the line number where it happened. I don't think dynamic languages would be very popular under those circumstances. Of course, that's not what happens. You're thrown into a debugger where you can quickly find out where the root of the problem is.
For types, we don't have that yet. All we have are these error messages. If you have a very rich and expressive type system that requires more knowledge to make sense of those error messages, you want more help. So one thing we want to investigate in the future is whether we can actually give you a more interactive environment such that if the types go wrong, you could find out why. For example, how the compiler figured out that this expression has this type, and why it doesn't think that this type conforms to some expected type. You could explore these things interactively. I think then the causes of type errors would be much easier to see than they are now.
On the other hand, some syntax is just new and takes some getting used to. That's probably something we can't avoid. We only hope that a couple of years from now these will be types that people take completely naturally and never question. There have been other things in mainstream languages that took some getting used to. I remember very well when exceptions came out, people found them strange. It took a lot of time to get used to them. And now of course everybody thinks they are completely natural. They are not novel anymore. And certainly Scala has a couple of things, mostly in the type side, which take some getting used to.
Come back Monday, May 25 for the next installment of this conversation with Martin Odersky. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter by clicking its checkbox in your account settings.
Have an opinion about the history presented in this article? Discuss this article in the Articles Forum topic, The Purpose of Scala's Type System.
![]() |
Martin Odersky is coauthor of Programming in
Scala: http://www.artima.com/shop/programming_in_scala |
The Scala programming language website is at:
http://www.scala-lang.org
The original paper on wildcards is "On Variance-Based
Subtyping for Parametric Types", by Atsushi Igarashi and Mirko
Viroli. In Proc. of ECOOP'02, Springer LNCS, page
441-469. 2002:
http://groups.csail.mit.edu/pag/reading-group/variance-ECOOP02.pdf
(PDF)
The quadratic growth of programs that occurs as you increase
the number types you don't know is described in "A Statically
Safe Alternative to Virtual Types", by Kim Bruce, Philip Wadler,
and Martin Odersky. In Proc. of ECOOP'98, Springer LNCS,
page 523-549. 1998:
http://lampwww.epfl.ch/~odersky/papers/alt.ps.gz
(Postscript)
Sponsored Links
|