Python Buzz Forum - KNIME and beginners

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
KNIME and beginners

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.

KNIME and beginners

Posted: Mar 15, 2010 5:37 PM

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: KNIME and beginners Feed Title: Andrew Dalke's writings Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.	Latest Python Buzz Posts Latest Python Buzz Posts by Andrew Dalke Latest Posts From Andrew Dalke's writings

I gave a presentation at OpenEye's CUP last week. More precisely, I was assigned a talk with the title "Evils of KNIME." I don't chose that sort of name, but the CUP organizers like to be a bit confrontational with presentation titles. I used my speaking slot as a platform for expressing my views on dataflow/visual languages. I don't like them, and think their effectivity is limited compared to a text language, so I explained why. Other people do like them and enjoy them. I've asked them why, and they have some good reasons. My presentation outlined those responses with some observations of my own, including suggestions for ways to improve the text-based toolkits so they are more accessible to "non-programmers."

The next few posts will be based on parts of that talk. Feel free to leave comments.

Upcoming training classes (pre-announcement)

I ended by pointing out that these are technological solutions. Why not spend some time training computational chemists to be more effective at writing software? I provide that sort of training. If you are interested, email me. I'm pinning down the dates for a course in Leipzig in mid-May (likely 18-20 May), and another in Boston in late July. I'll announce them when the dates are determined. if you want to influence those dates or schedule a course at your site, let me know.

Sample test case for KNIME

I haven't used KNIME for about two years. That experience was with KNIME 1.x. People told me that it's gotten better, so I decided it was well time to take a fresh look. Last time I couldn't get it to work on my Mac. I'm happy to report that things have changed, although there are still some difficulties with it regarding updates.

My test case was the first example from the Chemistry Toolkit Rosetta, specifically, to compute the heavy atom counts from an SD file. The pybel solution is:

import pybel
 
for mol in pybel.readfile("sdf", "benzodiazepine.sdf.gz"):
    print mol.OBMol.NumHvyAtoms()

It's not as short as I would like because I had to specify "sdf" twice and because it had to reach down into the underlying OpenBabel molecule object. Still, it's a lot more succint than using any of the base toolkits directly, and a good reference of what a text-based programming language is capable of when designed for ease of use.

What molecular properties can I compute? And how do I do it?

The first step was to find out if KNIME could compute the number of heavy atoms. When I say "KNIME" I mean "the CDK nodes which come with KNIME" since KNIME is a dataflow-based visual programming language with support for a number of extension packages, including chemistry nodes based on the CDK. Schrodinger, Tripos, ChemAxon and likely other companies provide nodes based on their respective toolkits, but I don't have a license to those tools. In any case the Mac version of KNIME doesn't yet support adding new nodes.

The most likely candidate was "Molecular Properties." The help says:

Create new columns holding molecular properties, computed for each structure. The computations are based on the CDK toolkit and include logP, molecular weight, number of aromatic bonds, and many others.

What other properties does it compute? I put the node on the workspace and double clicked on it to bring up the dialog box. The result is:

The dialog cannot be opened for the following reason:
No column in spec compatible to "CDKValue".

Huh? What does that mean?

A Google search for that error message found the same question from 9 September 2009 although concerning a different node. Bernd Wiswedel answered:

We obviously need to improve on the error messages. You need to process the output of the SD reader with the "Molecule to CDK" node, which will parse the structures into an appropriate format for the Lipinski node. Reason is that the Lipinski node is contributed from the CDK plugin, so it needs its desired input format.

What this means is the inputs need to be set up correctly before I can see more details. However, it's more complicated then that. If I set up the nodes as shown:

I still get the same error message when I click on the "Molecular Properties" box. Double-clicking on the "Molecule to CDK box" gives me

The dialog cannot be opened for the following reason:
No column in spec compatible to "SdfValue" "SmilesValue" "MolValue" "Mol2Value" or "CMLValue".

Turns out I need to put in a valid SD filename in the "SDF Reader" box (the one with the exclaimation point under it), in order to get the right inputs to "Molcule to CDK", in order to see the "Molecular Properties."

How accessible is KNIME to first-time users?

Is that really friendly for first-time users? That is, how is a first-time user supposed to: 1) know which options are available if they can't open an unconnected node, 2) know which inputs are required for a node, or for that matter see what outputs are available, 3) know that the "SDF Reader" needs to be converted from "Molecule to CDK" before it can be used by the CDK nodes?

Of course all those can be explained in the documentation, and perhaps they are explained. I admit I haven't read it, but then again the knime.org documentation doesn't show how to use the CDK nodes. And should someone have to read the documentation in order to do something basic like this task? If so, are dataflow systems really any easier than working with a text-based programming language?

Can't compute the number of heavy atoms?

I looked through the list of properties which could be computed:

Atomic Polarizabilities
Aromatic Atoms Count
Aromatic Bonds Count
Element Count
Bond Polarizabilities
Bond Count
Carbon connectivity index (order 1)
Carbon connectivity index (order 0)
Eccentric Connectivity Index
Fragment Complexity
Hydrogen Bond Acceptors
Hydrogen Bond Donors
Largest Chain
Largest Pi Chain
Petitjean Number
Rotatable Bonds Count
Topological Polar Surface Area
Vertex adjacency information magnitude
Molecular Weight
Zagreb Index

(BTW, it really does have mixed capitalization. Why yes, I am a nitpicker. How did you guess? ;) )

No "heavy atom count." Next option is to see if there's a way to specify the counts based on a SMARTS pattern. Nope, didn't find anything.

As far as I can tell, there's no way with the default nodes to do much of anything with KNIME. I assume there are additional packages which I can install, but why aren't there more useful CDK nodes as part of the standard installation? An obvious one to me would be a SMARTS count pattern matcher, where I could specify the SMARTS pattern, the option for unique or non-unique matche counts, and the output column name.

Is my problem because I'm on a Mac? Do Linux users get more nodes? Or is there something else I'm missing? How would you find the number of heavy atoms using KNIME? Is there a solution using the default CDK nodes or do I have to use one of the commercial toolkits?

Leave answers and comments here.

Read: KNIME and beginners

Previous Topic

Next Topic


	Web Artima.com