I've just spent a few days hunting for a memory leak in M2Crypto, without success. Are there better options for interfacing to OpenSSL from Python? A small rant.
This is my first rant here. I just spent several entire days trying to figure out why a long-running multi-threaded program that is making lots of SSL connections using M2Crypto is leaking memory. Unsuccesful!
I found numerous problems in M2Crypto. If you use the default parameters for the HTTPSConnection class, it creates an SSL Context object which is artificially kept alive in a global map of all SSL Context objects; you have to close the Context object explicitly to remove it from that map, but closing the HTTPSConnection doesn't close the context it created.
After fixing that, it still leaked pretty severely. Despite valiant attempts with valgrind (a useful enough tool, but unfinished) I still wasn't able to track down the leak; the stack trace shown by valgrind was missing several crucial entries (apparently calls through function pointers confuse it?).
But even if I find the cause of the leak, I may not be able to plug it. M2Crypto is written using SWIG, which contains lots of obfuscated code and doesn't exactly give me a warm fuzzy feeling that it's always doing the reference counting correctly. And of course, OpenSSL itself is pretty much entirely opaque. (Where the heck is d2i_X509 defined? It's got its own man page, and is used in numerous place, but I can't find a single definition of it -- no macro, no function! They must be defining it implicitly via some meta-macro...)
M2Crypto has other problems too: a month ago, I was trying to create a multi-threaded server handling SSL and HTTP connections. While it was not leaking, it was crashing as soon as I had multiple clients connect to it simultaneously. M2Crypto's author tried to help me but could not reproduce the problem. Then, later, I could not either. So what was going on? My suspicion is that the bug isn't fixed but an unrelated change in my code happens to stop triggering it -- meaning it could come back any time.
M2Crypto hasn't been updated in a year. It contains (in SSL/httpslib.py) various versions of an HTTPSConnection class which try to adjust to different Python versions, but due to a lack of imagination it reverts to the pre-2.2 code when Python 2.4 or higher is used! There's a patch purporting to upgrade it to version 0.13.1 but the patch looks bogus to me (it uses an unbound method to override a bound one, which can't possibly work). The HTTPSConnection.close() method does nothing, allowing for even more potential leaks. The SSLServer class defines an error handler with an incorrect number of arguments.
I could go on, but I think I've ranted enough, except for one thing. I've yet to see an extension module using SWIG that doesn't make me think it was a mistake to use SWIG instead of manually written wrappers. The extra time paid upfront to create hand-crafted wrappers is gained back hundredfold by time saved debugging the SWIG-generated code later.
OK, now I've got it off my chest. Is there an alternative? Bill Janssen recommended PyOpenSSL. Anyone concur?
I've used in a wrapper for the bitzi library and the musicbrainz library. With the musicbrainz one I had a perfectly working hand written c wrapper, and I rewrote it using ctypes and ended up with a lot less code, the code I have is python, and no compiling on troublesome platforms like windows.
- M2Crypto is really a toolkit and makes us able to access many many functionnallities of SSL easily... but reading Guido's post it seems to have some damn leaks pretty hard to deal with. - PyOpenSSL is "just" a library giving acces to SSL features only. It is really simpler than M2Crypto and offers a far smaller set of functionnality.
I tested both... but not as far as guido did for M2Crypto (I didn't really matter if a massive long use was creating a leak... it was useless for me to now that). So the context was different. But here is my point:
Is it better to use a very handy toolkit that has bugs that you can't get rid of (in a acceptable time and with an acceptable amount of work), or to use a more "basic" lib and write yourself the specific tools you need?
All depends on your context... if you really need all the functionnality of M2Crypto then you should continue to track down that stupid leak (and take some reserves of coffee and aspirin!). But if you can use just the basic SSL functionnality to do the job... even if you have to write some more stuff yourself (HTTPSconnection for exemple... maybe more...).
Personnaly I prefer the second way because I know it's easier to find a bug in a small lib than in a whole toolkit (wrapped via SWIG). I prefer to write a little more code myself than spending many day debug a crappy code that is not mine and whether 70% is useless for what I do...
So it's up to you, Mr Guido, to decide... all I can say is that according to my experience PyOpenSSL do the job well and is widely used... and if ever you find a leak or something in it, it would be easier to debug than M2Crypto (but you already spent a lot of time in M2Crypto so it's not necessarely a good argument).
Hmm. I once fiddled around writing a python binding for the libmdb which allows to read Microsoft Access .mdb files.
I tried using SWIG and a couple of generators there. The problem was mainly the very opaque API of the library which uses some of glib's special types for managing lists and stuff. The interface generated by SWIG actually worked but only put out some generic types you couldn't access from python.
After some more annoying fiddling I went by creating a Python C-module manually. I ended up with a single function taking the filename and returning back a (stupid) complex python structure of dictionaries that represent the whole database. Not memory efficient (but leak-free as far as I can tell) but easy to grasp and exactly tailored for my problem.
I'd have to slightly defend SWIG for fast wrapping API development. All it should mostly be if the library is designed well is quite a thin layer. Then the abstraction can be done in python. I think one problem is OpenSSL has a pretty nasty interface.
Had a quick look through the Swig files and found a few memory leaks Listed below.
_dsa.i:270: Not Freeing DSA_SIG_new on error path _dsa.i:333: Not Freeing sigbuf
_rand.i:97: Not freeing mem on error path _rand.i:100: Not freeing mem on error path
_rc4.i:64: Not Freeing out
Not sure what parameters/algorithms you are using so can't have a deeper look.
I'll have to raise my hand and claim ownership of the stupidity of the part of the patch that uses an unbound method to override a bound one.
It does fix the memory leak that I was facing at the time. But so would have providing None as the override for the bound method. Since in the context I was looking at, that method was never called. The memory leak was the combination of assignment of a bound method, use of __del__ for that class and python 2.1.3. At the time I was looking at this I did not have such an explicit explanation of the leak.
I'm relatively new to python - started last year - and I just started this week to play with SSL under python. It may not seem really heroic but I can summarize my opinion by : "I totally agree with Pierre-Yves above" :)
First I'm never ever keen to use "big-packages-that-does-everything-and-even-more" (the problem is usually the "even-more" : performances are going down and bugs are going up) so when I started - before I read this post - to play with the SSL, I looked at m2crypto, Twistedmatrix and pyOpenSSL etc... M2crypto, Twisted and so on are surely great tools, but I generally want to _understand_ what the code is really doing + I dont need all the features from them so I fell back on pyOpenSSL.
After several days I have now what I needed : https server and proxy + secure proxy as tiny as some dozens lines of code and just the openSSL import (+ the usual python network libraries). It works great with multithreaded requestHandlers and - for the moment - never had any crash even with several users at a time. Of course I havent tested it yet under high load and I havent tracked if there are memory leaks but as far as I am now, I am really pleased with the code : pyOpenSSL was a bit tough at start because there's not enough documentation and examples (well at least that wasnt obvious for me at all) and the internal network libraries in python are really great (again, wasnt obvious for me at start because of all the classes/sub-classes mixes but after some time it is really clear and cleverly made !)
The code is not fully finished/tidied, if I have time and my company approval I may release the code/how tos, nevertheless if anyone wants some code snippets before, just ask.
I havent tried Trevor's library tlslite (for performances reasons I prefered to try pyOpenSSL before) and sorry, will probably not use it now :), but it looks light, simple and effective and I found quickly some nice examples to start with.
The SSL area is something of a mess. There's PyOpenSSL, M2Crypto, and Python SSL socket objects. I think, but am not sure, that Python SSL socket objects are actually descended from PyOpenSSL, which hasn't been updated since 2004. Anybody know for sure?
None of these actually implement SSL; they're all wrappers of OpenSSL.
Python's built-in SSL objects are something of a Potemkin village of security - they go through the motions, but don't actually validate anything. The built-in SSL objects don't expose enough of the OpenSSL API to do any useful checking. The API functions that are exposed, "server" and "issuer", use the wrong formatter in OpenSSL, and the data returned isn't parseable, because it uses "/" as a delimiter without escaping it when it shows up (as it does) in certificate data.
M2Crypto has some useful Python code for looking at certificates and such, but its C glue code to Python has problems, as Guido has pointed out. It also requires an installation of OpenSSL separate from the one that comes with Python, at least on Windows.
What's needed, I think, is for someone to put more of the M2Crypto API functions for OpenSSL in the glue code for built-in SSL socket objects. Then the Python components of M2Crypto could be reused, and we'd only have one implementation of the glue code.
There are some truly wierd bugs in SSL land. I just discovered an interaction between "socket.setdefaulttimeout" and M2Crypto. If you change the default timeout, M2Crypto starts producing phony "peer did not return certificate" exceptions, every time. This may be related to "[ python-Bugs-1098618 ] socket.setdefaulttimeout() breaks smtplib.starttls()"
I believe all the issues Guido originally experienced have been fixed a long time ago. The latest M2Crypto release is 0.18.2.
If anyone has any difficulties with M2Crypto, I'd advice you to head over to the M2Crypto homepage at http://chandlerproject.org/Projects/MeTooCrypto which has information about mailinglist, bugzilla etc. to help you resolve and report issues.
In case somebody hits upon this page, I have taken over maintenance of M2Crypto (and yes, I believe all those memleaks were fixed a long time ago), and it now resides on https://gitlab.com/m2crypto/m2crypto/