This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Drawing molecules
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Many chemicals (okay, an infinite number) can be represented as a
molecular graph, with atoms for nodes and bonds for edges
[*]. The
nomenclature is somewhat confusing since each the molecular graph is
often called a molecule, and each component (connected subgraph) of
the molecule is a chemical molecule. A molecule data structure may
contain 0 or more subgraphs representing chemical molecules.
Molecules have a three dimensional structure. Small molecules are
often planar graphs and can be drawn as a two dimensional depiction,
with some special notations to handle chirality. (This is sometimes
called two-and-a-half dimensional.) Here's a depiction of ethyl
alcohol, known more affectionately as ethanol and found at the pub
nearest you.
That's a very verbose depiction with all those hydrogens sticking off
the heavy atoms. (There's all sorts of short hand names in chemistry
which highlight your background. I did molecular modelling of
biomolecules, where hydrogen is light and everything else is
heavy because it's 1/12th the mass of carbon, the
next heaviest atom we dealt with. Others more interested in metals
call non-metals grease. No doubt some cosmologists classify
everything as hydrogen, helium, and impurities.)
It can be compacted somewhat by moving the H'es alongside the heavy
atom, like the next picture. I choose to use H3C instead
of CH3 because of stylistic reasons. I'm sure a chemist
would shun me for that. :)
Writing in the hydrogens gets tedious. As it turns out, atoms have
valences, which you might recall from introductory chemistry when you
learned the Lewis dot model. Carbon has a valence of 4 which means
that it takes 4 single bonds, or 1 double bond and 2 single bonds, or
2 double bonds, or 1 triple bond and 1 single bond.
Actually, the Lewis model would also allow a quadruple bond under the
Octect rule, but that's not going to happen with carbon because under
the valence bond model is isn't possible to have all four electrons in
the outer shell point the same way. Larger atoms, in the
3rd row of the period table and above, can have quadruple
bonds, and it looks like
some systems
even have quintuple bonds. Daylight's toolkit only supports up to
triple bonds. OpenEye extends SMILES to include $ as the
quadruple bond symbol, and now I see ChemDraw even has a hextuple
bond, and it's been observed in Cr2.
Chemists just assume the valences will always be filled. (You have to
get to some pretty unusual physics to break that assumption, like
ultra high vacuum or extremely short timescale interactions.) Rather
than listing the hydrogen counts explicitly, they decided to use an
implicit hydrogen representation, where the number of hydrogens on an
atom is the atom's valence plus the charge minus the sum of its bond
orders. The most common exception to this is for hydrogens around a
chiral center.
Here's ethyl alcohol drawn using implicit hydrogens.
As an(other) aside, the polar hydrogen model lies between the explicit
and implicity models and is used in molecular mechanics. Polar
hydrogens have a large partial charge and are more likely to be in
long-range hydrogen bonds. Nonpolar hydrogens are mostly involved in
van der Waal bonding, which is very short range. MM merges a heavy
atom and its nonpolar hydrogens into a new atom type with masses and
charges adjusted accordingly. This cuts the number of simulated atoms
in half with hopefully only slight effect on the result.
If you're chemist drawing organic molecules all the time you'll end up
drawing a lot of carbons. Standard practice is that "normal" carbons
aren't drawn at all. Any bond ending or a bend with no element symbol
is assumed to be an uncharged carbon of average molecular weight. (I
suppose there are rare exceptions, like if you make isotopically pure
14C diamond or buckyballs you might ignore the 14.) But
when drawing your NMR structure, use 13C. (And when you
email me to correct my mistakes, remember, I'm a physicist by training
and have only learned your native practices by osmosis. :).
Here's ethyl alcohol as a chemist would sketch it.