If Python's what I want to do, I should write that book about it. Herein I muse about how to create a book relying largely on community input and help.
Another epiphany from hanging out with the Pythonistas in Brazil: After a dozen years, Python is still the only language I get really jazzed about. I should do something about it, so that I can do my consulting and speaking and training in Python rather than languages I'm not so thrilled about. To wit: write that book.
Several years ago I started Thinking in Python, which has been in limbo for quite awhile. But when I was actively working on it, I was getting some great input from the community. What I'd like to do now is:
Pull out what's good from that book
Update it all for Python 3
Figure out a way to create the book primarily from community input
Post the current electronic version as a set of web pages
Find interested folks to help in the management process, so that I can focus on the core of the book without getting distracted onto administrative and other issues.
The best way to produce a quality book in a short time is if I can concentrate on editing, writing and rewriting. I know from experience that noise from the other things can easily sidetrack me.
The working title is Python 3 Patterns & Idioms (for various reasons I may not be able to use my "Thinking in" title; don't ask). It would not be a book that develops from simple to complex the way that Thinking in C++ and Thinking in Java do, but rather a collection of different programming techniques, which lends itself better to a collaborative effort since you don't have to worry so much about whether basic things have been covered (it won't be an intro book) or the order of coverage, etc. Each chapter becomes a standalone short article.
Important tenets of the book:
The electronic version remains freely available in perpetuity
Everyone gets credit for contributions
Like my other books, all examples will be automatically extracted and tested to ensure code correctness (and so the code is packaged with the book). I think the tests should be part of each example, and I think Nose is the least intrusive/noisy of the Python testing systems.
The print version will be derived from the electronic version, but won't be the same (the early edits will go back into the electronic version, but at some point when I'm doing layout and polishing the print book will "disconnect" and maintaining both becomes prohibitive).
What's the best way to allow people to work on this? Version control or something like a controlled wiki or Google pages?
What's the best gatekeeping approach? I'm thinking that you just get commit permission from a gatekeeper, and if it's later discovered you are behaving badly you get kicked out.
For the print book I'll need everyone who touches it to sign a release form. That's probably just part of the gatekeeping process.
I'm thinking that Restructured Text is probably the best candidate for input. Is there something newer and better?
It might be interesting to allow readers to use easy_install or something similar to install the code package from the book, so they don't have to mess with dependencies.
What about easy feedback from readers who haven't passed through the gateway? The "Backtalk" system (which I note has been emulated elsewhere) that I created for Thinking in Python was a very early version and became unmanageable, mostly because of spam. Perhaps a similar system that links to an issue tracker like Trac? However, it's possible that the format of this book is such that each paragraph doesn't need to be tagged as in Backtalk. If each pattern or idiom is a small chapter, and each chapter has a single "feedback" button that links to the issue tracker then that will probably be adequate granularity.
> What's the best way to allow people to work on this? > Version control or something like a controlled wiki or > Google pages?
Version control. Use a modern distributed version control system (DVCS) like Git, that makes it easy for contributors to start working on their own, create and merge branches, and send patches back to you.
> What's the best gatekeeping approach? I'm thinking > that you just get commit permission from a gatekeeper, > and if it's later discovered you are behaving badly you > get kicked out.
This is also solved by using a modern DVCS: There is no explicit commit permission. Everybody can clone and commit in their own repos. If their stuff is good, you pull it into your own repo. If their stuff is bad, you don't. In time, you'll figure which contributors you trust (based on the quality of their contributions), and how much you trust them (i.e. how thoroughly you'll want to review their work before pulling it into your repo).
The important point here is that this is all a social convention, and not a technical hurdle that must be crossed before you really know whether you trust someone or not. (Read more about the social/technical distinction here: http://lwn.net/Articles/246381/)
This approach also scales much better than having to hand out commit access to all contributors. If the project grows very large indeed, and there is just too much work for you to keep track of, you can appoint "lieutenants" that help you collect and edit contributions (similar to how the Linux kernel folks work).
> For the print book I'll need everyone who touches it > to sign a release form. That's probably just part of the > gatekeeping process.
You'll have to clear this with your lawyers, but you might be able to use the "Signed-off-by"-convention in Git (the -s option to git commit and git format-patch): Basically you state clearly that all patches/commits must carry a "Signed-off-by" line, and that by adding such a line, you agree to <insert legal blurb here>.
Both Linux and the Git project itself uses the "Signed-off-by" to confirm that the author agrees to the project's Developer's Certificate of Origin.
> I'm thinking that Restructured Text is probably the > best candidate for input. Is there something newer and > better?
Agreed. Restructured Text is the best format I've seen so far for this type of work.
> It might be interesting to allow readers to use > easy_install or something similar to install the code > package from the book, so they don't have to mess with > dependencies.
Sounds like a good idea.
> What about easy feedback from readers who haven't > passed through the gateway? [...]
As I've hopefully demonstrated above, this won't be an issue if there is no gateway.
Although it might still be good idea to have some kind of feedback web form, so people can make simple suggestions without having to clone repos, make commits, send patches, etc.
Have fun! :)
...Johan (who learnt C++ from "Thinking in C++", and is now making a living as a C++ programmer. Thanks!)
> This is also solved by using a modern DVCS: There is no > explicit commit permission. Everybody can clone and commit > in their own repos. If their stuff is good, you pull it > into your own repo. If their stuff is bad, you don't. In > time, you'll figure which contributors you trust (based on > the quality of their contributions), and how much you > trust them (i.e. how thoroughly you'll want to review > their work before pulling it into your repo).
Interesting. This would also have the benefit of forcing me to get up to speed on a DVCS.
> > I'm thinking that Restructured Text is probably the > > best candidate for input. Is there something newer and > > better? > > Agreed. Restructured Text is the best format I've seen so > far for this type of work.
I have a vague recollection of hearing about some successor to restructured text which is why I asked. But I use ReST for my weblogs and definitely like the plain-text format, which would also fit well with the DVCS.
> Although it might still be good idea to have some kind of > feedback web form, so people can make simple suggestions > without having to clone repos, make commits, send patches, > etc.
I think this is really important, but without something to manage the feedback it rapidly gets out of control and unusable. That's why I'm thinking of an issue tracking system.
When Lawrence Lessig updated Code, he used a wiki (http://www.socialtext.net/codev2/index.cgi) to allow for community input which was nice as it made for a very low barrier to entry. I don't know how successful it was at managing the complexity, because, other than provide a few minor edits, I didn't follow the project very closely.
While I love the idea of using a DVCS (in particular: Mercurial), you'd have to keep in mind that it will probably create a barrier that some people simply won't cross. It's much harder to create you're own branch, fix a spelling mistake and submit a patch than it is to hit an edit button, though the barrier-to-entry issue may be over powered by the need for a signed release. After all, if you've convinced them to sign a release and learn ReST, you can probably get them to install git/mercurial/bazar/darcs.
+1 on using a issue tracker for keeping track of feedback though, that's a great idea.
Then again, perhaps this is a barrier you *do* want to create, the barrier of having the good will to do 30 seconds worth of work, and requiring some (if not much) technical knowledge.
And, btw, a dvcs would be great. I really like bzr, and you can mix completely distributed and a more svn-like mode of work, perhaps for editors or for people whose chances you'd want to incorporate straight away (instead of manually pulling afterwards).
> Then again, perhaps this is a barrier you *do* want to > create, the barrier of having the good will to do 30 > seconds worth of work, and requiring some (if not much) > technical knowledge.
I think the two-tiered approach should cover the bases: the feedback form for people who just want to make a quick contribution, and the DVCS for those who want to work on it more seriously.
> And, btw, a dvcs would be great. I really like bzr, and > you can mix completely distributed and a more svn-like > mode of work, perhaps for editors or for people whose > chances you'd want to incorporate straight away (instead > of manually pulling afterwards).
So it now sounds like there's three choices for DVCS: Mercurial, Git and BZR?
> So it now sounds like there's three choices for DVCS: > Mercurial, Git and BZR?
Yeah, that's about right. Many seem to end up choosing between Mercurial and Git. Git people will tell you that their branch support is superior to Mercurial's, while Mercurial people will tell you that it doesn't matter because you can branch-by-cloning instead. Git has better support for cleaning up your history before you submit (i.e. creating a nice and tidy patch series from a more messy development history). Git also seems to have a slight performance edge over Mercurial (both runtime-wise and storage-space-wise), while Mercurial claim to be more portable due to being written in Python instead of a mix of C and shell scripts. At the moment Git seems to have a more active user and developer community, with fairly regular releases.
Otherwise, the two are pretty much equivalent when it comes to basic philosophy, core features, GUI and web interface availability, etc.
After having initially started with Mercurial, I migrated to Git (finding its design more elegant and scalable), so I would suggest using Git. However, don't take my word for it, as I'm obviously biased... ;)
Well, I'm partial to bzr, but I haven't used git much (or Mercurial at all). Bzr is very flexible, and integrates nicely with svn.
As I mentioned on an earlier post, may be used in a large variety of "user topologies" (http://doc.bazaar-vcs.org/bzr.dev/en/user-guide/index.html lists several, chapters 3-6). This might be useful for the different types of contributors (those whose work you want to verify and perhaps incorporate some of it, those who you want to commit directly to a central branch, those whose changes you want to go through a moderator, but then possibly go to the central branch as well, if approved). I believe this versatility is very much desired, in this project (and I don't know how git and hg fare on this front).
Bzr is Python, and actively developed (mainly by Canonical).
Honestly, though, Bruce, I think in the end all three would do a good job, and, if you choose A, proponents of B and C will come up and tell you that "Mine would do this better" on a number of occasions. Perhaps you should just choose one you're comfortable with, write (and have someone write) a very short introductory guide (aimed specifically at the book) and tell everyone else to "get over it" (TM).
I would really be against using GIT, as it needs cygwin to run on windows and is a pain to get on OSX. Either Mercurial or Baazar is probably going to be more than enough for a book. I'm usually inclined to use bzr as it seems to have the easier user interface of both, and performance is not an issue here (and the difference in perf from bzr to hg is lower at each bzr release).
> I would really be against using GIT, as it needs cygwin > to run on windows and is a pain to get on OSX.
This is plain wrong. Although you can certainly run Git on top of Cygwin, the preferred method for using Git on Windows is called msysgit (http://code.google.com/p/msysgit/). It is considerably faster than the Cygwin version, and has no dependencies (i.e. can be installed directly on a regular Windows).
> Either Mercurial or Baazar is probably going to be more > than enough for a book. I'm usually inclined to use bzr > as it seems to have the easier user interface of both, > and performance is not an issue here (and the difference > in perf from bzr to hg is lower at each bzr release).
For the purposes of this book, either of the three contenders are probably going to do the job in a satisfactory manner.
> Great idea! > > For inspiration/guidance: have a look at how the Django > folks have written their Django Book > (http://www.djangobook.com).
Thanks. That's a good recommendation
> For versioning control: I am a very happy BZR user. I > highly recommend it.
I have to say, the documentation page is what sold me, especially because some people helping on this project (my friend who does editing, for example) are not computer experts. The friendly tone and thorough coverage, and the fact that they emphasized group documents and not just software, is quite compelling. The fact that it is actively-supported Python, paid support by Canonical, no less, is very attractive; I'm quite pleased at what they've done with Ubuntu.
Using a dvcs seems like a good idea to me. Personally, I would recommend going with either bzr or hg because their commands are much closer to svn and will be a bit more familiar to people. The version control needs of a book are not so significant, so you could actually just go with googlecode's svn repository and be done with it. People can still use bzr+svn if they wanted to (and you could use it yourself).
Also, you might consider using Paver (shameless plug) to manage the book process until you move to Word:
1. Kept in separate files that are easily readable and testable on their own 2. can be pulled into the doc files whenever desired so that you can have the code right in front of you as you write 3. can be easily removed from the doc files just before commit so that the repository is clean
Paver uses Sphinx which generates nice HTML and can also generate latex to create PDFs. I've also considered doing something like linking sections of docs to Disqus or similar so that people can comment.
Overall, it's a very simple process and the tools are all there. You can see what docs with included code look like here:
> > I would really be against using GIT, as it needs cygwin > > to run on windows and is a pain to get on OSX. > > This is plain wrong. Although you can certainly run Git on > top of Cygwin, the preferred method for using Git > on Windows is called msysgit > (http://code.google.com/p/msysgit/). It is considerably > faster than the Cygwin version, and has no dependencies > (i.e. can be installed directly on a regular Windows). > > As for OSX, you can download the git-osx-installer > (http://code.google.com/p/git-osx-installer/) which should > be as easy to use as msysgit on Windows, or native git on > Linux.
Sorry, I haven't been following much of git after a log time ago when it was a real pain to get it to work on windows. So I was uninformed, thanks.
" Git also runs on Windows. There are actually two options, the "official" one requiring the installation and use of Cygwin (a POSIX emulation layer). ... Regardless, many people find a Cygwin installation too large and invasive for typical Windows use.
A native Microsoft Windows port (using MinGW) is approaching completion, with versions of installers ready for testing (under the names "Git" and "msysgit", where "Git" is aimed for users). "
" Many projects support both POSIX and Windows. Such projects typically avoid using an SCM system that poorly supports Windows, even if most developers use POSIX-based systems. Examples of projects that have publicly ruled out any use of Git, due to Git's poor support of Windows, include Mozilla and Ruby. "
Besides this I'm sorry and you are right.
Why not use sphinx for the book? It seems like a great format for all kinds of docs, and it is probably easy to do a script to run all the code of the book as doctests. Seems like the pythonic thing to do.
> Sorry, I haven't been following much of git after a log > time ago when it was a real pain to get it to work on > windows. So I was uninformed, thanks. > > But it seems that Wikipedia needs an update look at this > article http://en.wikipedia.org/wiki/Git_(software): > > [...] > > Besides this I'm sorry and you are right.
No problem. Git is developing at a fast pace (even too fast for Wikipedia, it seems...), so I don't blame you for not keeping up... ;)
> Why not use sphinx for the book? It seems like a great > format for all kinds of docs, and it is probably easy to > do a script to run all the code of the book as doctests. > Seems like the pythonic thing to do.
Sphinx (http://sphinx.pocoo.org/) does indeed look like a very good alternative. AFAICS, it's based on reStructuredText, and creates both HTML and PDF (via LaTeX) output. However, it seems to be somewhat more geared towards documentation-within-code (i.e. creating API documentation from in-code docstrings) than code-within-documentation (i.e. a book containing code examples), so I'm not sure it's a perfect match...
Flat View: This topic has 15 replies
on 2 pages