Python Buzz Forum - Why Small Packages Matter

With Eggs and simpler installation and dependencies there's more opportunities to distribute smaller packages and split large packages up into pieces.

I was reading this post on Zope:

I think this [splitting up of packages] is a big deal and too hope that we can "explode" Zope 3 into many eggs soon. Jim wants to do it for Zope 3.4, yay! Having smaller, more easily distributable Zope packages will also reduce the buy-in into Zope as a platform. For example, you won't have to ship with the ZMI anymore if you not only think it sucks but also find it disturbing to develop with.

Of course, I like this direction. But not because of small distributions; there's a small number of places where that matters (e.g., mobile phones), but for most of us the download time and disk space isn't a big deal. And after all, just because one app doesn't require the ZMI doesn't mean you don't have it installed -- if it is already there you aren't saving any disk space.

The advantage I see in breaking up pieces is discipline, extensibility, and creating a hierarchy to the concepts in your library or framework.

Discipline:

Given a big package, developers will sometimes say that the package is loosely coupled on the inside, and if you want to use foo.x but not foo.y that's fine, because they don't depend on each other.

How do you know they don't depend on each other? How do you know they won't depend on each other in the future? How do you know someone won't read about DRY and factor out pieces that are shared between the two modules? How do you know someone won't "fix" a bug in foo.z that both modules use, breaking something you depended on in some interface?

If you use two different distributions for these two modules, you actually have lots of ways of detecting these problems and truly keeping the modules decoupled. It's not automatic -- there's always opportunities to break things. But the discipline of distribution boundaries (and other separations like separate release schedules) will tend to keep you honest about coupling.

Extensibility:

One argument for keeping things packaged together is that it allows for optional integration, so that people who want to use all the features get a more convenient tool.

This means that, for instance, an object may have a method that binds it to another module in the package. But it's optional, because you don't have to call the method. The programmer asked to trust that this "optional" feature is truly optional may consider all the questions raised under Discipline, but imagine that these issues are addressed. So what's the problem?

In this case, the optional integration has a privileged position. The original author's libraries get special hooks, but the developer using those libraries doesn't get the same access. You could monkey patch your own extension, but you'll only have created a horrible coupled mess. You could avoid the extension entirely, of course, but if the original author thought it was sufficiently useful to create the extension it is likely that another user of the library will feel the same.

Hierarchy of Concepts:

Ideally a system will be layed out with a nice hierarchy of concepts:
Library-A      Library-B
  |    \        /
  |     \      /
  |      \    /
Library-C \  /
           \/
        Library-D
           |
           |
        Library-E
To understand Library A you have to understand all of Library C, D, and E. To understand Library D you only need understand Library E.

Given a hierarchy like this, there's actually an advantage to not using the entire framework/system. You don't need to understand nearly as much, and learning a library is probably the biggest overhead to using a library.

It can be argued that if you want to use Library A, you only need to read about Library A. If the documentation is very good, this is somewhat true. It is true if you use it perfectly and write no buggy code and the libraries themselves have no bugs and you don't need to do anything that goes outside the bounds of what Library A provides. This isn't my experience programming, and isn't typical when using F/OSS.

There is also a hierarchy of stability. If Library E is a moving target then you are just plain hosed. If someone keeps making API changes in Library D you are also hosed. If stability does not increase as you move down your stack then the stack is a big ball of mud, even if at one isolated moment it might seem like an elegant and stable system.

For all these reasons when someone claims their framework is all spiffy and decoupled, but they just don't care to package it as separate pieces... I become quite suspicious. Packaging doesn't fix everything. And it can introduce real problems if you split your packages the wrong way. But doing it right is a real sign of a framework that wants to become a library, and that's a sign of Something I'd Like To Use.


	Web Artima.com