Summary
With the advent of languages such as Python, the debate over typing has
heated up again. Contrary to some claims (notably from Bruce Eckel), I
believe Python has strong typing, and this article explains why.
Advertisement
What is a "type", anyway?
Before talking about what kind of type system a language supports, we
should establish agreement about what a type is in the first place. My
definition is that a type is metadata about a chunk of memory
that classifies the kind of data stored there. This classification
usually implicitly specifies what kinds of operations may be performed
on the data.
Common types include primitive types (strings and numbers), container
types (lists/arrays and dictionaries/hashes), and user-defined types
(classes). In Python, everything is an object, and every object has a
type. In other words, functions, modules, and stack frames are also
types.
So what's "strong typing", then?
From my POV, strong typing prevents mixing operations between mismatched
types. In order to mix types, you must use an explicit conversion.
Here's a simple Python example:
>>> 1 + "1"
Traceback (most recent call last):
File "", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> 1 + 1
2
>>> "1" + "1"
'11'
>>> 1 + int("1")
2
>>> "1" + str(1)
'11'
Conversely, weak typing means that you can mix types without an explicit
conversion. Consider this example from Perl:
DB<1> print "1"+1
2
DB<2> print "1".1
11
Note that conversion is not the same thing as coercion, IMO. Coercion
occurs when you have a statically-typed language and you use the
syntactic features of the language to force the usage of one type as if
it were a different type (consider the common use of void*
in C). Coercion is usually a symptom of weak typing. Conversion, OTOH,
creates a brand-new object of the appropriate type.
Why do some people think Python has weak typing?
Historically, "strong typing" has been associated with static typing.
Languages noted for strong typing include Pascal and Ada; languages noted
for weak typing (most notoriously BASIC) had primarily dynamic typing.
But the language that ought to be most notorious for weak
typing has static typing: C/C++ (yes, I'm lumping them together)
It's very clear that Python has only dynamic typing; any target may hold
a binding to any kind of object. More than that, Pythonic programming
style is to use inheritance primarily for implementation; Python's
name-based polymorphism means that you rarely need to
inherit for interface. In fact, the primary exception to inheriting for
implementation is Python exceptions, which uses issubclass()
for the purpose of determining which exceptions get caught by an
except clause.
I might even go so far as to say that Python's name-based polymorphism
is hyperpolymorphic. And therein lies the tiny kernel of truth about
Python's weak typing. People who have gotten used to Java and C++
requiring syntactic support to declare typing often feel uncomfortable
with the Pythonic style of relying on run-time exceptions to get thrown
when an inappropriate object is passed around:
class Silly:
def __init__(self, data):
self.data = data
def __add__(self, other):
return str(self.data) + str(other.data)
def double(a):
return a + a
print double(1)
print double('x')
print double([1])
print double(Silly({'a':1}))
print double({'a':1})
produces
2
xx
[1, 1]
{'a': 1}{'a': 1}
Traceback (most recent call last):
File "test.py", line 14, in ?
print double({'a':1})
File "test.py", line 8, in double
return a + a
TypeError: unsupported operand types for +: 'dict' and 'dict'
Bruce Eckel equates "weak typing" with "latent typing", but that's at
odds with historical usage, not to mention that it confuses the two axes
of strong/weak and static/dynamic.
Sidebar: Name-based polymorphism
For those of you unfamiliar with Python, here's a quick intro to
name-based polymorphism. Python objects have an internal dictionary
that contains a string for every attribute and method. When you
access an attribute or method in Python code, Python simply looks up the
string in the dict. Therefore, if what you want is a class that works
like a file, you don't need to inherit from file, you just
create a class that has the file methods that are needed.
Python also defines a bunch of special methods that get called by the
appropriate syntax. For example, a+b is equivalent to
a.__add__(b). There are a few places in Python's internals
where it directly manipulates built-in objects, but name-based
polymorphism works as you expect about 98% of the time.
Contrary to some claims (notably from Bruce Eckel) Seems like the use of terminology in "Type Checking and Techie Control" was a bit lax. It reads like Bruce Eckel was commenting on other peoples mis-application of weak typing to Python (rather than supporting that usage).
Frank Mitchell brought some clarity to this back in February (in the Strong versus Weak Typing discussion). For completeness we should mention that there are Typeless languages like BCPL.
Historically, "strong typing" has been associated with static typing Smalltalk-80 had strong-dynamic-typing more than 20 years ago, so this debate sometimes feels like re-warmed leftovers (just replace mention of Ada 95 or C++ with Java, and replace mention of Smalltalk with Python).
Things have moved on: - Soft typing is an approach to type checking for dynamically typed langnages. Like a static type checker, a soft type checker infers syntactic types for identifiers and expressions. But rather than reject programs containing untypable fragments, a soft type checker inserts explicit run-time checks to ensure safe execution.
- Type Inference (as Frank Mitchell mentioned) for languages like Clean, Haskell, ML, the compiler is able to determine the type of a function automatically. It does so when type checking a program.
Rather than debate whether the old approach to dynamic typing provided by Python, is better than the old approach to static typing provided Java; ask when Python will have support for Soft Typing, ask when Java will have Type Inference.
> Therefore, if what you want is a > class that works > like a file, you don't need to inherit from > <code>file</code>, you just > create a class that has the <code>file</code> methods that > are needed.
It's worth noting that Python isn't alone in shifting away from inheritance. C++ is doing the same thing in STL: as long as a class implements the right methods, the template doesn't care what it's working with.
> Rather than debate whether the old approach to > dynamic typing provided by Python, is better than the > old approach to static typing provided Java; > ask when Python will have support for Soft Typing, ask > when Java will have Type Inference.
Type Inference in ML, as I understand it, assumes that each function and operator has well-defined argument and result types. For that reason, Standard ML needs explicit types to distinguish integer math from floating-point math, and OCaml actually uses different operators for floating point: "+.", "-.", "*.", etc. Object-oriented polymorphism complicates this sort of analysis, since the meaning of the method depends on the type of receiver, whcih could be one of types that needs inferring.
APIs using Soft Typing preclude "name-based typing" or "duck typing" that gives Python some of its power. For example, mock objecs are much easier to implement if they don't have to subclass an existing class in order to compile or run, or implement more methods that a test uses. Also, being able to "fake" an existing standard type can simplify production code that loads data lazily from a database or network. In both cases, one can't pass a Socket-like or Customer-like class (for example) unless it extends or implements Socket or Customer. (Note that Java 1.3+ can dynamically generate implementations for interfaces, but dynamic subclasses requires deep ClassLoader and bytecode magic.) For those applications, you're back to the limitations of Static Typing; the programmer needs some way to fool the compiler and/or runtime into thinking that an arbitrary object is really a Socket or Customer ... whick defeats the point.
Type Inference in ML My understanding is just about good enough to follow this explanation "Understanding why Ocaml doesn't support operator overloading" http://caml.inria.fr/archives/200211/msg00347.html
Clean & Haskell take a different approach and support operator overloading. Which means that sometimes you have to provide type information.
Once we add objects and polymorphic subtyping (as you say) more compromises are required - although supposedly O'Haskell rarely requires explicit type information.
(It's worth mentioning that many of the users of these languages love to explicitly declare type information.)
Static typing in Nice Perhaps the Nice language is a better example of a programming language with a more expressive static typing than Java. Nice compiles to Java bytecodes, and may yet support some form of type inference.
Soft Typing precludes "name-based typing" Is Soft Typing tied to structural type equivalence, which prevents it from being useful with a language that uses name equivalence - Python - or is it just really difficult to implement?
In anycase, let's see what else has been going on ;-)
Maybe we should be asking when Python will have type recovery tools like SmallTyper? (Seems as though some work is going on in that direction)
I have gotten a few replies from people who are annoyed at the idea that I could consider Python "weak" in any way. And I appreciate that Aahz has, at least, sprinkled his comments with "IMOs".
Having worked in depth with C, C++, and Java, I am quite aware of the distinction between static and dynamic typing. Pre-ANSI C ignored a lot of type checking (one reason the benefits of C++ were so dramatic, and also why ANSI C type checking was greatly improved), and C is statically typed -- there are no runtime mechanisms that check type. But more sophisticated languages are not one or the other. C++ is primarily statically typed, but it does have a small amount of dynamic typing. Java and C# do a lot of static type checking, but they also have significant information available at runtime, such as array lengths and RTTI that is used to dynamically check casts. These languages have additional size and runtime overhead in order to produce the extra dynamic behavior, but we've generally found it to be a worthwhile tradeoff.
In general, static type checking means "things the compiler can check at compile time." Python does do a little static type checking whenever it can, but as Aahz pointed out it is primarily dynamically checked.
Strong vs. Weak/Latent typing, on the other hand, talks about the constraints imposed upon the types when they are used. As noted, people are sometimes offended by the idea that Python could be considered "weak", but this is actually a very powerful concept. It definitely doesn't mean that Python is careless about types, but rather that Python isn't fanatic about types. Instead, it only cares as much as it has to. Put another way, weak typing means that Python will not be strident about the way that you use objects -- if you try to send a message to an object, and the object might be able to accept that message, then a strongly-typed language will say "hey, you said you would only ever call 'speak' for a pet object, and here you are calling it for a robot! No way!" Instead, Python will say "If you think that will work, I'll trust you and try it."
You could say that the real distinction between a strongly typed and weakly typed language comes in the function argument list:
def communicate(anything):
There is no type constraint here, which says that Python doesn't require you to pass any particular type of object into the function. C++, Java and C# all require you to say precisely the type of object you are going to pass in, and the only flexibility you have is whether you pass a subtype of the declared class. Python, on the other hand, "weakens" the type constraint so that you don't have to jump through a bunch of hoops in order to express yourself. So "weak" is really "powerful."
This doesn't mean that Python doesn't care at all. If you pass an object into communicate() that doesn't have a speak() method, that's not OK. The type system doesn't allow you to send the wrong message to an object, but as Aahz points out, the determination of the type violation is dynamic rather than static -- you find out at runtime that the type has been used incorrectly, rather than statically, at compile time. This, I think, is why people are often confused: "weak" typing can sound as if it means you can get away with anything that you want, which isn't true, and so it might be less confusing to use the term "latent".
Note that C++ went to a great deal of trouble to provide a kind of weak typing mechanism with templates, and Java and C# are following suit. This is "weak typing in a strongly typed language" so the checking still happens at compile time (at least in C++; indications are that some amount of RTTI will be used with Java and C#), but the effect for the programmer is the same: I can call communicate() and pass it any type that supports the speak() method. To accomplish this in C++/Java/C# with their template mechanism requires a lot more burnt offerings at the altar of strong static type checking, but those folks get the warm fuzzies of compile-time error messages (if you've ever seen a C++ compiler barf over a misused template, you'll understand I'm being sarcastic here -- we'll see if Java or C# can produce something more comprehensible). I have learned that the safety provided by strong static type checking is only an illusion (you still need to write tests before you can trust your code), so the benefits of Python's weak/latent typing far outweigh the perceived disadvantages.
I have had a weblog about this on my todo list for awhile, and Aahz's posting has helped me refine the ideas (which I obviously need to present in a clearer fashion than I have accomplished so far). Keep watching http://mindview.net/WebLog and it will eventually appear.
Well, there is no <i>explicit</i> type constraint there. The implicit type constraint however is:
<pre> function communicate(forall x:{speak:A}) : B </pre>
Meaning, communicate is a function that will take as argument all objects which type responds/conforms to the message/interface "speak" (where speak has the message-constraints/interface defined by A). Finally, "communicate" returns an object with constraints/interface defined by B.
Constraint terminology is usually used if the compiler autonomously discovers the constraints, interface terminology if the programmer states them explicitly (C++ templates being a notable exception to this rule :-).
Giving types to the rest of your example follows
<pre> class Pet: def speak(self): pass </pre> <i>Pet:{speak:()->Nothing}</i>
etc. This is the foundation of "soft-typing" (partial interface inference), i.e. constraints are carried around the type inference engine until it reaches a point where it doesn't have enough information, it then adds a dynamic assertion about the type and continues on...
Possibly also of interest: This is a response to Bruce's original article. It discusses how modern static typing in languages like Haskell and Nice is becoming more useful and less oppressive all the time. It covers some of the same sorts of points that Bjorn just raised, with examples.
Modern Static Typing: Less Code, Better Code (or, "How Java/C++/C# Ruin Static Typing for the Rest of Us")
The concept of adaptation deals with the issue of type in an very powerful and dynamic way. Using adaptation one simply asks for a version of an object that's suitable for a particular purpose. For example, if one expects to use an object as a dictionary one could say aDict = adapt(anObject, dict) and be assured that the returned object will work as one. If anObject itself is or can work as a dictionary it will be returned directly by adapt(). Otherwise adapt() will look for a suitable adapter which can provide dictionary functionality on behalf of anObject.
The second parameter to adapt() is considered a protocol, which in practice usually means a basic type or an interface. PyProtocols provides flexible and powerful interface objects for Python which can be used for much more advanced adaptation cases than the simple dictionary example.
The main advantage of adaptation over standard type checking/handling techniques (e.g. testing by type() or wrapping with try/except) is the ability to provide adapters for pretty much any conceivable purpose. Liberal use of adapt() throughout one's code brings about a very high level of flexibility that's simply not possible using conventional coding techniques.
Bruce Eckel: > > This, I think, is > why people are often confused: "weak" typing can sound as > if it means you can get away with anything that you want, > which isn't true, and so it might be less confusing to use > the term "latent".
But "weak typing" has been historically used to describe the kind of permissive coercion and casting that can be done in C/C++ and Perl. If we adopt your meaning for "weak typing", what's an appropriate phrase for that? Since "latent typing" covers what you want to talk about, I think "weak typing" should be restricted to what it sounds like.
why people are often confused: "weak" typing can sound as if it means you can get away with anything that you want, which isn't true, and so it might be less confusing to use the term "latent".'
What I'm finding confusing is the continued use of the vague perjorative terms "strong typing" "weak typing" and now "latent typing"!
Let's be explicit: 1 are there type constraints on variables? 2 are there type contraints on runtime values? 3 are the type constraints checked without running the software?
static typing 1 Yes 2 No 3 Yes
dynamic typing 1 No 2 Yes 3 No
typeless 1 No 2 No 3 N/A
static and dynamic typing 1 Yes 2 Yes 3 Some
(and other combinations)
"the real distinction between a strongly typed and weakly typed language comes in the function argument list:
def communicate(anything):
There is no type constraint here, which says that Python doesn't require you to pass any particular type of object into the function. C++, Java and C# all require you to say precisely the type of object you are going to pass in"
Why is that different than the distinction between static typing (type constraint on parameter variable) and dynamic typing / typeless?
Why use the terms strong typing and weak typing when you are distinguishing between static typing and dynamic typing? That is confusing.
typed - languages where variables can be given types untyped - languages that do not restrict the range of variables
explicitly typed - types are part of the language syntax implicitly typed - types are not part of the syntax
trapped errors - let the program stop immediately untrapped errors - unnoticed, cause arbitrary behaviour safe program - does not cause untrapped errors
static checking - compile time checks prevent unsafe programs from ever running dynamic checking - run time checks keep a program safe
strong checking - no untrapped errors can occur weak checking - some untrapped errors can occur