This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Resolver
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Resolver is a
spreadsheet/Python program hybrid. The cells of the spreadsheet
correspond to data definitions in Python. Python code can define
functions used in expressions in cells, change fields in the
spreadsheet, and change the presentation of the spreadsheet and its
cells.
The Python code contains many sections, some of which are editable and
some of which are not. These are: "Import statements", "Worksheet
creation", "Pre-constants user code", "Constants and Formatting",
"Pre-formulae user code", "Formula code", and "Post-formulae user
code". The "user code" sections are editable. Note the British use
of "formulae" instead of the American use of "formulas". Hmm, and is
there a capitalization inconsistency between "Import statements" and
"Constants and Formatting"?
During recalculation the Resolver engine processes the sections in
various steps. For example, after the worksheets creation code the
"pre-constants user code" is run, which defines functions that can be
used by the "constants and formatting" section that populates fields
in the speadsheet. Control goes back and forth between user-defined
sections and sections defined by the contents of the spreadsheet.
Resolver also does some work during the processing. For example,
after the formulas (sorry, "formulae") are defined, the engine
computes all of the values, including inter-cell dependencies. The
post-formulae code can then operate on the computed values, for
example to highlight a field which is out of bounds. Because of this,
the entire spreadsheet is recomputed each time (but read
this on partial recalculation).
The Resolver
screencast is well done and you should watch it to get a better
idea of how things work in action. Fuzzyman works for Resolver and
has independently set up a very useful site called Resolver Hacks which is more
user and developer oriented than the main Resolver corporate
page. As far as I can tell there's no link from the corporate
page to the hacks page but I found it probably because of Fuzzyman's
blog. IRL he's Michael Foord.
The idea of a spreadsheet comes from the financial world, and Resolver
continues that tradition. But there's nothing in either which
constrains them to the financial world. Many computational chemists
use Excel to hold and analyze chemistry data, and there are support
tools to help them. Probably the most well-known is Accord
for Excel. It adds the ability to view structures in 2D, color
and align structures, compute physical properties, and more.
The Resolver people sent emails to people in non-financial communities
to see if there was interest. I was one of those people, with my
background writing software for chemists and biologists. After
watching the screencast I sent them a set of questions, mostly
answered now on the hacks page, and especially the Bugs,
Features and Futures section. They replied, and I was given
access to the beta. Emily and I sat down a couple of days ago to look
at it, and here's the notes I took about it, fleshed out a bit.
Giving it a whirl
We installed it on her personal machine, as I have a Mac and I don't
know how to get .Net code working on it. The install went without a
problem. We started with the simple
example spreadsheet then Emily watch the screencast while I tried
a few things based on the code therein.
One of the difficulties we had was figuring out what the different
blocks of Python code were for. There's nothing which explains the
control flow, or that while some Python sections are editable, others
are not. We tried changing a Python formula that was defined in a
cell but while we could click on it we couldn't change it. That's
understandable because it's very hard to keep the 1-to-1 mapping
between the cells and the Python code if anyone can edit the code.
That's why only some of the blocks are editable.
On the other hand, because that code is machine generated, it should
be possible to figure out which cell generated the code so that a
click or right-click can take you to that cell for editing. After
all, if I can see the code I want to be able to change it.
While experimenting I created two cells, one on top of the other, with
the same formula in each. Resolver showed an "X" through the cell.
Perhaps that's the expect way to indicate a duplicated formula by
experts but neither of us are Excel users. We tried to figure it out
using a right-click but found nothing in the context menu or elsewhere
helped out.
Want my own editor!
I use emacs. Emily uses Wing
IDE and like the tooltip help common to most modern IDEs. The
Resolver code editor does have syntax highlighting but doesn't have
tooltips, nor even intelligent indentation. And Emily was used to
using control-/ to comment out a block of code and control-R to "run"
the code, rather than using F9 to recalulate. I expect that last is
because Resolver mixes the spreadsheet and the programming
expectations together. Emily's comment was "seems to me I'll get
frustrated with this quite quickly."
These editing features could be added, but why spend time doing that
when so many editors already exist? Instead we created a .py file
next to the .rsl (the extension for a Resolver spreadsheet) file and
imported it. This works, with caveats. The IDEs don't know about the
.Net environment inside Resolver, so can't pull in those tooltips.
Loaded modules are cached, so the code in Resolver needs to use the
"import
with reload" hack.
The PYTHONPATH is wonky. It seems to be set to the directory
containing the .rsl file but when we clicked on a .rsl file through a
web browser, the newly loaded spreadsheet's cwd was the same as the
other spreadsheet. We didn't investigate the full behaviour. Still,
there should be same way to set the PYTHONPATH to point to any locally
installed Python modules. If there is, we don't know it.
Resolver, outside of the IronPython runtime, doesn't know anything
about the Python files. You can't sent the .rsl and assume it works,
if it references imported Python code. Instead you need to zip all
the files together and exchange it that way. One solution I was
thinking of was to make the .rsl file more like a project file, with
the ability to add external files to some list. In that way it could
watch those files for changes, and support some way to bundle
everything together more automatically.
Another "spreadsheet world meetings programming world" question was
how to handle version control. This is somewhat mentioned in the bugs,
feature and futures page, but only from the spreadsheet developers
viewpoint. I think it's mostly a matter of figuring out which of the
many possible solutions is a good enough match to what's needed.
Testing
Emily brought up the question of how to test a spreadsheet and any
associated Python files which depend on the Resolver runtime. Is
there a way to run things "headless"? She was very happy to see the
section Testing
Spreadsheet Logic, which is based on unittest. She's a TextTest fan, which compares
stdout, stderr and execution times to a golden reference. Making
Resolver work with that does not appear hard so we didn't spend much
time exploring that option.
Accessing web services
There are chemical informatics libraries for C, C++, Python, Java and
Ruby but none for .Net that I know of. While it might be possible to
write bindings for the C/C++ libraries, I have no experience with
them. Instead, we experimented with having Resolver call a web
service. The hacks site has a relevant example plotting
currency changes where the data is fetched over the web.
You can see from the example that it uses a .Net library for doing the
data fetch, rather than using urllib. This is because IronPython does
not implement the socket library, though you can get an emulated
version from FePy.
It looks like Resolver freezing while Python is doing the network I/O.
That is, when Emily clicked on the .rsl file to open it, Resolver
didn't seem to respond for almost a minute before finally displaying
the spreadsheet. We assume it was loading the new code and doing the
network fetch.
Some of the calculations we want to do might take minutes to run. In
the web browser code we poll the server every few seconds to update
the status (either as a reload or through an AJAX request). How would
we do an asyncronous task like that in Resolver? For example, I might
spawn off a polling thread. After every poll I get a status value,
which I'll use to color a field and probably also write something like
"10% done" ... along with a cancel button? When it's finally done I
want to display the result in the cell, remove any controls and status
coloring, and let any other cells which may depend on that newly added
value be computed.
Limits? Console?
How big can a spreadsheet be? I think Excel maxes out at 64K, which
causes a problem with some of the data sets chemists use. We didn't
experiment to see what the limits are, and I can't find a page stating
the Resolver limitations like this, if any.
We really do want an interactive interpreter console, in order to more
quickly test things out and interact with the spreadsheet. It's
coming.
Plotting (err, "charts") and graphics
The plotting in that example used a 3rd party .Net package called ZedGraph.
(See also basics of
ZedGraph and Resolver.) The plots it makes are colorful, and a
scientist would look at them with mistrust. I'm only somewhat joking.
The connection between ZedGraph and Resolver is pretty coarse. The
end result of the plot is a bitmap, which Resolver displays. There's
no way to adjust the scales, zoom in, pick points, etc. Apparently
better plotting is coming, but plotting is a genuinely hard subject.
It is possible to use an external plotting package, like this example
using
gnuplot. Oh, and the spreadsheet world calls them "charts"
instead of plots.
As I mentioned, I would like some way to override how a cell is
displayed so I can stick a 2D (or 3D) chemical drawing in the cell,
perhaps along with some text label on the bottom. I would also like
some way to, say, click on a cell and have some window pop up, perhaps
containing a larger 2D depiction or showing molecular properties.
Best would be if I could control everything inside of that window,
perhaps in its own thread. I don't know if that's possible or in the
near-term plans.
Database? Running as a web page?
It's possible to hook Resolver up to a database and populate the
spreadsheet that way, but I didn't find any examples of using it. We
also didn't experiment with turning the spreadsheet into a web page.
Gotta talk to users
I like the idea of making things more programmable. I've been one of
many advocating a Mathematica-style workbook approach, like SAGE does, but I
don't like the systems I've seen so far. I do like Resolver. The
next step, now that we've explored it a bit, is to talk with the
chemists and learn more about what they use spreadsheets for and
if/how Resolver might improve things. If that work out, then talk
more with the Resolver people to perhaps advocate for certain
prioritizations.
One last thing I think the Resolver team should do is to improve their
"about" splash screen. It's very hard to read. :)