The Artima Developer Community
Sponsored Link

Python Buzz Forum
connection matrix vs. node with edges for molecules

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
connection matrix vs. node with edges for molecules Posted: Sep 22, 2003 2:55 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: connection matrix vs. node with edges for molecules
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement
At the CRBM conference I showed the basic graph-like API that I and Brian Kelley (and others) found useful for molecular data structures. It says that atoms have bonds and bonds have atoms, and the molecule has a list of all atoms and all bonds. Iddo Friedberg asked why we chose that over a connection matrix. I don't think we answered it fully, so here's part of my follow-up....

I wanted to follow up on your comments about using a connection matrix for bond information. Both Brian and I were against it, but weren't that clear in explaining why. My comment was that the matrix itself would take up too much space, which isn't correct since it can be stored as a sparse matrix, as you pointed out.

Brian gave a different reason - that an atom with a list of ordered bonds - helps with chirality, but didn't full explain why. I thought at first I understood his reason, but now I'm less certain. Reading the SMILES docs at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html#RTFToC24

SMILES uses a very general type of chirality specification based on local chirality. Instead of using a rule-based numbering scheme to order neighbor atoms of a chiral center, orientations are based on the order in which neighbors occur in the SMILES string.
This means Daylight doesn't believe the actual order of bonds is important, only the order of atoms in the SMILES string (or more generally, the insertion order of the atom into the molecule).

If that's the case, and Brian has more experience in dealing with chirality so he'll make the final call on that, then that isn't a strong reason against using a sparse matrix.

However, the data model we use is

  • a molecule has a list of atoms
  • an atom has a list of bonds / "other" atoms
This model is a representation of a sparse matrix, is it not? That is, to see if there's a connection between two atoms, index into the list of atoms in the molecule to get the list of bonds and search that list for the other atom.

API-wise, all that's missing in what we propose compared to a matrix API is the [i][j] lookup, that is, "given an atom, return the bond to another atom, or None/KeyError if there isn't one." Easy enough to add, but something I haven't needed often.

In the cases where I do need a connection matrix, I just create one from the graph. In that way I don't have to worry about changing the matrix when I add/delete atoms as it only reflects the connectivity which existed when I made the matrix. I do think maintaining a proper, generic sparse distance matrix in the face of edits is the major reason to use a representation which more closely fits the graph-like data model.

The chirality argument says that you may want to change the order of the bonds for a given atom. With a connection matrix, the bond order is invariant and it depends on the order of atoms in the matrix. If I want the bonds for a given atom to be in a different order (which is effectively never, and so why I consider this a weak argument) then I'll need a rearrangement vector which tells me how to get the atoms in the correct order. A performance hit and cumbersome code.

(If you do change the bond ordering then the library may need to automatically reflect those changes so the chirality stays correct. The toolkits I know, Daylight and OEChem, don't let users change the bond ordering - and it's not a feature I've needed - so I don't know the expected behaviour for this case.)

Read: connection matrix vs. node with edges for molecules

Topic: Groovy Previous Topic   Next Topic Topic: Mikel on aggregators

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use