The Artima Developer Community
Sponsored Link

Python Buzz Forum
KNIME and beginners

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
KNIME and beginners Posted: Mar 15, 2010 5:37 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: KNIME and beginners
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement

I gave a presentation at OpenEye's CUP last week. More precisely, I was assigned a talk with the title "Evils of KNIME." I don't chose that sort of name, but the CUP organizers like to be a bit confrontational with presentation titles. I used my speaking slot as a platform for expressing my views on dataflow/visual languages. I don't like them, and think their effectivity is limited compared to a text language, so I explained why. Other people do like them and enjoy them. I've asked them why, and they have some good reasons. My presentation outlined those responses with some observations of my own, including suggestions for ways to improve the text-based toolkits so they are more accessible to "non-programmers."

The next few posts will be based on parts of that talk. Feel free to leave comments.

Upcoming training classes (pre-announcement)

I ended by pointing out that these are technological solutions. Why not spend some time training computational chemists to be more effective at writing software? I provide that sort of training. If you are interested, email me. I'm pinning down the dates for a course in Leipzig in mid-May (likely 18-20 May), and another in Boston in late July. I'll announce them when the dates are determined. if you want to influence those dates or schedule a course at your site, let me know.

Sample test case for KNIME

I haven't used KNIME for about two years. That experience was with KNIME 1.x. People told me that it's gotten better, so I decided it was well time to take a fresh look. Last time I couldn't get it to work on my Mac. I'm happy to report that things have changed, although there are still some difficulties with it regarding updates.

My test case was the first example from the Chemistry Toolkit Rosetta, specifically, to compute the heavy atom counts from an SD file. The pybel solution is:

import pybel
 
for mol in pybel.readfile("sdf", "benzodiazepine.sdf.gz"):
    print mol.OBMol.NumHvyAtoms()
It's not as short as I would like because I had to specify "sdf" twice and because it had to reach down into the underlying OpenBabel molecule object. Still, it's a lot more succint than using any of the base toolkits directly, and a good reference of what a text-based programming language is capable of when designed for ease of use.

What molecular properties can I compute? And how do I do it?

The first step was to find out if KNIME could compute the number of heavy atoms. When I say "KNIME" I mean "the CDK nodes which come with KNIME" since KNIME is a dataflow-based visual programming language with support for a number of extension packages, including chemistry nodes based on the CDK. Schrodinger, Tripos, ChemAxon and likely other companies provide nodes based on their respective toolkits, but I don't have a license to those tools. In any case the Mac version of KNIME doesn't yet support adding new nodes.

The most likely candidate was "Molecular Properties." The help says:

Create new columns holding molecular properties, computed for each structure. The computations are based on the CDK toolkit and include logP, molecular weight, number of aromatic bonds, and many others.
What other properties does it compute? I put the node on the workspace and double clicked on it to bring up the dialog box. The result is:
The dialog cannot be opened for the following reason:
No column in spec compatible to "CDKValue".
Huh? What does that mean?

A Google search for that error message found the same question from 9 September 2009 although concerning a different node. Bernd Wiswedel answered:

We obviously need to improve on the error messages. You need to process the output of the SD reader with the "Molecule to CDK" node, which will parse the structures into an appropriate format for the Lipinski node. Reason is that the Lipinski node is contributed from the CDK plugin, so it needs its desired input format.
What this means is the inputs need to be set up correctly before I can see more details. However, it's more complicated then that. If I set up the nodes as shown:
I still get the same error message when I click on the "Molecular Properties" box. Double-clicking on the "Molecule to CDK box" gives me
The dialog cannot be opened for the following reason:
No column in spec compatible to "SdfValue" "SmilesValue" "MolValue" "Mol2Value" or "CMLValue".
Turns out I need to put in a valid SD filename in the "SDF Reader" box (the one with the exclaimation point under it), in order to get the right inputs to "Molcule to CDK", in order to see the "Molecular Properties."

How accessible is KNIME to first-time users?

Is that really friendly for first-time users? That is, how is a first-time user supposed to: 1) know which options are available if they can't open an unconnected node, 2) know which inputs are required for a node, or for that matter see what outputs are available, 3) know that the "SDF Reader" needs to be converted from "Molecule to CDK" before it can be used by the CDK nodes?

Of course all those can be explained in the documentation, and perhaps they are explained. I admit I haven't read it, but then again the knime.org documentation doesn't show how to use the CDK nodes. And should someone have to read the documentation in order to do something basic like this task? If so, are dataflow systems really any easier than working with a text-based programming language?

Can't compute the number of heavy atoms?

I looked through the list of properties which could be computed:

  • Atomic Polarizabilities
  • Aromatic Atoms Count
  • Aromatic Bonds Count
  • Element Count
  • Bond Polarizabilities
  • Bond Count
  • Carbon connectivity index (order 1)
  • Carbon connectivity index (order 0)
  • Eccentric Connectivity Index
  • Fragment Complexity
  • Hydrogen Bond Acceptors
  • Hydrogen Bond Donors
  • Largest Chain
  • Largest Pi Chain
  • Petitjean Number
  • Rotatable Bonds Count
  • Topological Polar Surface Area
  • Vertex adjacency information magnitude
  • Molecular Weight
  • Zagreb Index
(BTW, it really does have mixed capitalization. Why yes, I am a nitpicker. How did you guess? ;) )

No "heavy atom count." Next option is to see if there's a way to specify the counts based on a SMARTS pattern. Nope, didn't find anything.

As far as I can tell, there's no way with the default nodes to do much of anything with KNIME. I assume there are additional packages which I can install, but why aren't there more useful CDK nodes as part of the standard installation? An obvious one to me would be a SMARTS count pattern matcher, where I could specify the SMARTS pattern, the option for unique or non-unique matche counts, and the output column name.

Is my problem because I'm on a Mac? Do Linux users get more nodes? Or is there something else I'm missing? How would you find the number of heavy atoms using KNIME? Is there a solution using the default CDK nodes or do I have to use one of the commercial toolkits?

Leave answers and comments here.

Read: KNIME and beginners

Topic: KNIME and beginners Previous Topic   Next Topic Topic: 2082 called (on Vimeo)

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use