This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: cyclops_mysql and jquery-marvin
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
It's been many months since I last posted anything. My fiancee came
back from a tour of duty in Iraq and we did a driving tour for almost
three months. Turns out, driving, travel, visiting friends and family,
and getting things prepared for a wedding take a lot of work. I'm sad
to say that I put off work and other external issues during most of
this time.
Going back to work after such a long break proved hard, as many of you
can well imagine. I needed something to get back into the swing of
things before tackling client projects, so I dusted off an old project
and played around with a new one.
Cyclops-mysql
In 2009 at OpenEye's CUP
X, I presented some work I did on writing MySQL user-defined
functions ("UDF"s) for cheminformatics using OEChem.
There's some history here. A client of mine was developing a web
application and needed just a couple of cheminformatics
extensions. They were already using MySQL and OEChem, so I think I
billed them about 40 hours and wrote some UDFs for them. My contract
with them says they get the copyright to what I do for them, so that
was the end of that code.
But I wanted to present something for CUP so I rewrote everything, and
improved on it. For example, I looked around at other cheminformatics
extensions for MySQL and noticed they didn't make effective use of
some ways to get higher performance out of a MySQL UDF. The two main
examples are to allocate objects in the *_init function and use them
during the rest of the search, rather than reallocating each time, and
to check for static input values in the _init instead of trying to
reevaluate them for each row.
I got some pretty good numbers, and reported them in my CUP X
presentation titled "Database
extensions for fun and profit." But I didn't release the code
because it was conference-ware and not usable.
Later that year, OpenEye released their OEGraphSim
toolkit with fingerprint support, including code for the 166 MACCS
key and for path-based hash fingerprints. I decided to update my code
to take advantage of this new toolkit.
For more details see the README. The "oe_*" functions map almost
directly to OEChem or OEGraphSim functions. The "fp_*" functions work
on fingeprints, expressed as hex-encoded strings. MySQL UDFs, unlike
the equivalent technology in PostgreSQL, are not object based, so
these functions only work with strings or numbers.
The package also includes a pretty comprehensive test suite and a
benchmarking tool. The OpenEye tools are fast. On my laptop I can
generate about 7,000 canonical SMILES per second in a database query,
and about 16,000 SMARTS matches per second. See "macbook_pro.bench"
for details, and "README.benchmark" for more information about the
benchmarking tools.
jQuery-Marvin
The other tool I worked on was jquery-marvin-0.8.tar.gz,
an improved interface for working with Marvin, which is a
Java-based chemical structure viewer and editor. Marvin comes with a
"marvin.js" script to help integrate Marvin into a web page, but as I
write in the README:
Chemaxon's Javascript code for Marvin is 12 years old, according to
the copyright statement. It uses a Javascript programming style which
is now considered obsolete and it depends on a lot of brower sniffing
which is no longer needed in modern browsers.
One of the obsolete things it does is "document.write". I didn't want
to do that.
Now, there's reasons for ChemAxon to keep the code they have, and I
mention possibilities in the README. But I don't the same constraints
so I wrote a brand new interface, based on jQuery-ui. Some of the advantages to
this package are:
You can use $().marvinview() and $().marvinsketch() to put a viewer
or sketcher at any place in the DOM tree and remove it.
Most input parameters are checked, so if you make a typo and use a
wrong parameter name or unexpected value then you get a Javascript
exception describing what needs to be fixed. (You can override
the check if needed.)
It uses only three global variables. (Two are needed to capture
property and mouse events.)
It includes regression tests (based on qunit.js)
If you want the property change and mouse events then you can use
jQuery's normal event mechanism instead of going through the global
functions. (It even does the right thing with mouse events.
Although I can't figure out someone would use those.)
It has some disadvantages as well:
I've only tested it on my Mac with Safari, Opera, and
Firefox. It's entirely possible that it won't work on IE or
Windows, and I have no intention of supporting old browsers.
I don't support the entire set of Marvin parameter APIs. For
example, this demo
shows that MarvinView can take MarvinSketch options, which are
used if 'editable' is true, which lets people open a
MarvinSketch window for further editing.
There's no documentation.
The self-tests are incomplete.
The verification checks add a lot to the code. However,
since you're working with Marvin (= big jar files), then
the few KB of Javascript code isn't a big issue.
This package is not production quality code but it is complete
enough that the adventuresome and curious shouldn't have a problem, at
least, not with the core functionality. To see how to use it, try some
of the demos then look at the source.
Work, marriage, and honeymoon
Okay, it's time for me to get back to paying work, and to apologize to
my clients for putting them off so long. Then again, I'm getting
married soon, with a honeymoon immediately after, so I won't be able
to get much done. Hmm...