This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Extending Python
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Another way to extend Python is to write an interface to an existing C
or C++ libraries. Earlier
I showed how to write a wrappper to the command-line version of
OpenEye's OGHAM program mol2nam, in detail. I used it as a
teaching example because it was simple code that showed some of the
problem of working with external programs.
There are better ways to call the IUPAC naming code from Python.
OpenEye doesn't provide Python bindings to it but they do provide the
needed C++ libraries and example code for compiling it ( in
$OECHEM/examples/ogham.cpp). That's enough information to
write my own extension for just that one function. At the end of this
essay I point to a few alternate ways to write an extension.
The Python documentation has detailed
examples of how to write a Python extension and others have
written additional documentation, so I won't go into details. If
you need examples you can easily look at existing extensions to see
how they are done.
I need to lay out my new C++ code in the way that Python expects.
It's filename is smi2name.cpp and the name of the new shared
library will be "_smi2name". (Often C extensions used by
module "X" are given a leading underscore, as in "_X".)
/* This file is named smi2name.cpp */
#include "Python.h"
#include "openeye.h"
#include "oesystem.h"
#include "oechem.h"
#include "oeiupac.h"
using namespace OESystem;
using namespace OEChem;
static PyObject *
smi2name(PyObject *self, PyObject *args) {
const char *smiles;
OEMol mol;
if (!PyArg_ParseTuple(args, "s", &smiles)) {
return NULL;
}
/* If there is a failure, simply return None. */
/* I could raise an exceptio instead but this is easier */
if (!OEParseSmiles(mol, smiles)) {
Py_INCREF(Py_None);
return Py_None;
}
/* Compute the IUPAC name and return it to Python */
std::string name = OEIUPAC::OECreateIUPACName(mol);
return Py_BuildValue("s", name.c_str());
}
/* Set up the method table. */
static PyMethodDef _smi2name_methods[] = {
{"smi2name", smi2name, METH_VARARGS, "convert a SMILES to an IUPAC name"},
{NULL, NULL, 0, NULL}, /* Sentinel */
};
/* This function must be named "init" + <modulename> */
/* Because the module is "_smi2name" the function is "init_smi2name" */
PyMODINIT_FUNC
init_smi2name(void) {
(void) Py_InitModule("_smi2name", _smi2name_methods);
}
Again, the details of things like reference counting are explained in
the Python documentation. They are not complicated but do require
close attention to detail.
The next step is to build the shared library. The easiest way to do
this is to use the distutils package,
which has a section
on building extensions. I just need to make a setup.py
file with the needed configuration in it and disutils does the rest.
import os
from distutils.core import setup, Extension
OE_INCLUDE = os.path.join(os.environ["OE_DIR"], "include")
OE_LIB = os.path.join(os.environ["OE_DIR"], "lib")
# Check that we're pointed at roughly the right place
oechem_h = os.path.join(OE_INCLUDE, "oechem.h")
if not os.path.exists(oechem_h):
raise AssertionError("Cannot find oechem.h at %r" % (oechem_h,))
setup(name='smi2name',
version='1.0',
ext_modules=[Extension('_smi2name', ['smi2name.cpp'],
include_dirs=[OE_INCLUDE],
library_dirs = [OE_LIB],
libraries = ["oeiupac", "oechem", "oesystem",
"oeplatform", "z", "m"])
],
)
This setup.py does a very basic check to test if the OE_DIR
environment variable is correct by looking for the oechem.h file in
the include directory. If it isn't there it does. This check is a
bit strict for real use, but fine for now.
To compile the extension run python setup.py build in the
shell. Here's what it looks like for me (with line wraps included for
clarity):
The result is put under the build/ directory. The location
is different for different machines. To test it out you can make a
symbolic link from the created .so file to the current
directory:
ln -s build/lib.darwin-7.9.0-Power_Macintosh-2.3/_smi2name.so .
# ^^^^^ change as appropriate ^^^^^
(Or set your PYTHONPATH, but this is easier.)
It's built, time to test it
% python
Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import _smi2name
>>> _smi2name.smi2name("c1ccccc1O")
'phenol'
>>> print _smi2name.smi2name.__doc__
convert a SMILES to an IUPAC name
>>> result = _smi2name.smi2name("C1CCC")
Warning: Error parsing SMILES:
Warning: Unclosed ring.
Warning: C1CCC
Warning: ^
>>> result is None
True
>>>
In this example I printed the docstring to show that the text was
available from the C++ code. I also show that OEChem's error messages
were still going to stderr when it couldn't parse the SMILES, which is
why the smi2name function call returned None.
The next step would be to make a smi2name.py function that
provides the primary interface to the rest of Python. It should
implement the previous API. Sadly, that isn't as easy as it seems
because the previous code was able to extract SMILES parsing error
messages that this version doesn't handle, because it's tricky to get
that data.
Never-the-less it does show that writing a C++ extension for Python
isn't too hard.
There are other approaches for writing an extension. I wrote the
interface by hand. If the library the library has more than a few
tens of functions that gets boring real fast. Much of the work is
rote and repeative and can be done by machine. The best known of
these is SWIG which generates C and
C++ interfaces for Python, Tcl, Perl, Ruby, and several other
languages. I use SWIG in PyDaylight.
SWIG builds the interface from the header but if that's too
complicated then a human has to write an interface file. The basic
problem is that C++ code is very complicated to parse. A recent
approach is pyste which is part
of the Boost project. Pyste uses
gcc to parse the header files into a XML format then reads the XML to
generate the actual interface code.
The approaches described above all require a compiler. The ctypes
package uses a different approach often called a ffi for
foreign function interface. It is able to load a shared
library and call the function directly. The downside is that mistakes
in defining how to do the call take the whole program down.