Python Buzz Forum - connection matrix vs. node with edges for molecules

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
connection matrix vs. node with edges for molecules

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.

connection matrix vs. node with edges for molecules

Posted: Sep 22, 2003 2:55 AM

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: connection matrix vs. node with edges for molecules Feed Title: Andrew Dalke's writings Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.	Latest Python Buzz Posts Latest Python Buzz Posts by Andrew Dalke Latest Posts From Andrew Dalke's writings

At the CRBM conference I showed the basic graph-like API that I and Brian Kelley (and others) found useful for molecular data structures. It says that atoms have bonds and bonds have atoms, and the molecule has a list of all atoms and all bonds. Iddo Friedberg asked why we chose that over a connection matrix. I don't think we answered it fully, so here's part of my follow-up....

I wanted to follow up on your comments about using a connection matrix for bond information. Both Brian and I were against it, but weren't that clear in explaining why. My comment was that the matrix itself would take up too much space, which isn't correct since it can be stored as a sparse matrix, as you pointed out.

Brian gave a different reason - that an atom with a list of ordered bonds - helps with chirality, but didn't full explain why. I thought at first I understood his reason, but now I'm less certain. Reading the SMILES docs at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html#RTFToC24

SMILES uses a very general type of chirality specification based on local chirality. Instead of using a rule-based numbering scheme to order neighbor atoms of a chiral center, orientations are based on the order in which neighbors occur in the SMILES string.

This means Daylight doesn't believe the actual order of bonds is important, only the order of atoms in the SMILES string (or more generally, the insertion order of the atom into the molecule).

If that's the case, and Brian has more experience in dealing with chirality so he'll make the final call on that, then that isn't a strong reason against using a sparse matrix.

However, the data model we use is

a molecule has a list of atoms
an atom has a list of bonds / "other" atoms

This model is a representation of a sparse matrix, is it not? That is, to see if there's a connection between two atoms, index into the list of atoms in the molecule to get the list of bonds and search that list for the other atom.

API-wise, all that's missing in what we propose compared to a matrix API is the [i][j] lookup, that is, "given an atom, return the bond to another atom, or None/KeyError if there isn't one." Easy enough to add, but something I haven't needed often.

In the cases where I do need a connection matrix, I just create one from the graph. In that way I don't have to worry about changing the matrix when I add/delete atoms as it only reflects the connectivity which existed when I made the matrix. I do think maintaining a proper, generic sparse distance matrix in the face of edits is the major reason to use a representation which more closely fits the graph-like data model.

The chirality argument says that you may want to change the order of the bonds for a given atom. With a connection matrix, the bond order is invariant and it depends on the order of atoms in the matrix. If I want the bonds for a given atom to be in a different order (which is effectively never, and so why I consider this a weak argument) then I'll need a rearrangement vector which tells me how to get the atoms in the correct order. A performance hit and cumbersome code.

(If you do change the bond ordering then the library may need to automatically reflect those changes so the chirality stays correct. The toolkits I know, Daylight and OEChem, don't let users change the bond ordering - and it's not a feature I've needed - so I don't know the expected behaviour for this case.)

Read: connection matrix vs. node with edges for molecules

Previous Topic

Next Topic


	Web Artima.com