The Artima Developer Community
Sponsored Link

Python Buzz Forum
New Cheminformatics Projects

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
New Cheminformatics Projects Posted: Feb 3, 2010 10:20 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: New Cheminformatics Projects
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement

I've started two new open projects for cheminformatics and I'm looking for help in both of them.

Chemistry Toolkit Rosetta

The Chemistry Toolkit Rosetta (CTR) is a set of common cheminformatics tasks implemented using a variety of different toolkits and approaches. It is meant primarily as a way for people to understand and compare how the different APIs work.

Currently there are 16 tasks, 14 of which are well-defined and have at least one solution (in OpenEye/Python since that's what I know best). Several also have solutions in Pybel, and there are a couple RDKit and CDK solution as well.

Some of the CTR tasks are:

It needs your help. The project started in part because I don't know RDKit, CDK, or Indigo that well - to say nothing of the commercial tools available from Symyx, Accelrys, Schrodinger, and others. I know them a bit better now, but not enough.

Feel free to contribute a solution in your toolkit of choice! Or provide commentary, feedback, or improve an existing solution. You can even contribute a new task, if it's characteristic of a frequently encountered cheminformatics-related problem which several toolkits can handle.

By the way, I give a big thanks to Noel O'Boyle for his feedback on the project direction and for his Pybel and Cinfony contributions to help flesh out CTR before this public annoucement.

Chem Fingerprints

The other project I started is called "chem-fingerprints" or "chemfp" for short. Its goal is to develop a couple of file formats for cheminformatics fingerprints as well as tools and libraries which work with those formats.

The main problem it addresses is that there is no widely used fingerprint format, so each research group or even individual researcher ends up making a new one, as well as the tools to work with it. See the use cases for some more detailed examples.

So far I've written a proposal for a line-oriented text format called "FPS" meant to be easy to generate and parse, and have sketched out a inary format called FPB meant for fast loading, at the expense of some preprocessing.

The FPS format is simple enough that you can likely figure out most of it from this example, taken from the specification:

 #FPS1
 #num_bits=256
 #software=RDKit/2009Q3_1
 #params=RDKit-Fingerprint/1 minPath=1 maxPath=7 fpSize=256 nBitsPerHash=4 useHs=True
 #source=/Users/dalke/databases/Compound_00000001_00025000.sdf.gz
 #date=2010-01-27T02:22:26
 fffeffbfb7fffedff7beefdbddf7ffffabff76cf6df7fcf6f7fffebf7d7ffd6f 1
 fffeffbfb7fffedff7beefdbddf7ffffabff76cf6df7fcf6f7fffebf7d7ffd6f 2
 ffffbfdfffffffffbfeffffffffffffffffffffffff77efffffffebfffffffef 3
 00c02010002610000080800041100002084000440d100000c055048801224400 4

I've developed a set of tools to generate FPS fingerprints from OpenEye, OEChem, and RDKit, as well as to extract fingerprints from SD tags; specifically the CACTVS substructure keys in PubChem. These are available from the Mercurial repository.

These tools are in development status, and are primarily meant at this time as a way to get concrete feedback for the specification.g

Other tools I would like to develop, perhaps with your help, are command-line programs for similarity search and substructure filters.

I'm also looking for input and feedback on the format definitions, and for people who want to add support for these formats in their tools.

If you are interested in chemfp, then sign up on the chemfp mailing list.

Read: New Cheminformatics Projects

Topic: New Cheminformatics Projects Previous Topic   Next Topic Topic: HipHop for PHP: Move Fast

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use