The Artima Developer Community
Sponsored Link

Python Buzz Forum
Naming molecules

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
Naming molecules Posted: Oct 7, 2003 4:34 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: Naming molecules
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement
Suppose you are a physicist. After some analysis in your home-built NMR machine you've figured out the active ingredient in your vodka has the following chemical structure:

If you really had no chemistry training at all you probably wouldn't even include the bonds. A bond is a way of representing electron density, which can be computed knowing the atoms' positions and applying some quantum mechanics and computer power. And if you want to show off, toss in the phrase "Bohr-Oppenheimer approximation" so you don't have to worry about treating the nucleon as anything other than a fixed point. (Huh. Given how rarely that phrase is used, my act of saying "Bohr-Oppenheimer approximation" may make this page the top hit for it in search engines.)

Odds are you probably had a chemistry class in high school so you know about drawing the structure with bonds. Now you want to find out more about it. But how? Image search is still very immature and there are many ways to depict the graph, so that's not going to work.

One way is to look for the molecular formula, which is the counts of the number of each atom type. This structure is C2H6O but a web browser doesn't like subscripts so try "C2H6O". The first hit is for the "c2h6o -- happy hour" mailing list at Georgia Tech, which suggests people already know about this compound. But it doesn't give you much clue as to what it is.

The next hit is for dimethyl ether which has the same molecular formula but looks like

There you see the problem. The molecular formula isn't unique. You really would like a compound to have one and only one name, and for a name to refer to only one molecule. After thinking about it some more you realize that the molecular formula itself could be written several ways, like H6C2O (lightest element first) or OC2H6 (heavest element first). There are six possible permutations for three atoms.

Searching for the first alternative you come across lecture slide which says "H6C2O could correspond to both Ethanol (H3CH2COH) and dimethyl ether (H3COCH3)". Ahh! A clue! Maybe this is called ethanol. But it's kinda worrying to see the formula written as H3CH2COH, which is different than the six permutations listed above.

Further searching finds links to sites promoting the commercial use of ethanol, but not until the sixth link do you find some useful chemical information and verification that you've got the right structure. But it is still disconcerting that they use the formula CH3CH2OH which is yet another possibility.

What are you going to do the next time you want to find information about a molecule? It seems these things have names, so you look into that some more and find out that the International Union of Pure and Applied Chemistry (IUPAC to its friends and enemies alike) have a huge amount of documentation related to nomenclature. Using their rules gives a way to assign a unique name to a molecule.

And look, that page says ethanol is written C2H5OH. *sigh*.

The documentation is overwhelming so in growing frustation you find an introduction to the naming of compounds, which conviently uses ethanol as its example.

At its simplest, the IUPAC name for an organic compound contains these two parts:
  • a root indicating how many carbon atoms are in the longest continuous chain of carbon atoms.
  • a prefix and/or suffix to indicate the family to which the compound belongs.

The longest carbon chain is two carbons so it has the prefix "eth". There is a single bond between them (that's "single bond" as in a bond with single bond type, not that there's only one bond between them) so it's an "ethane". There's an OH on the end which uses the suffix "ol". Drop the "e" and join them to make "ethanol". Ta-da!

Upon reading that tutorial you realize there's a lot of memorization of names, and you went into physics because you prefered formulas and math over names. And because you would rather be electrocuted or irradiated instead of being around chemical containers with big warning stickers like "Danger: Bone Seeker".

After digging around a bit you realize that even trained chemists have problems with names. Chemistry librarians were worth their weight in platinum in their knowledge of the arcane magic of finding the right literature references.

Good thing you've got a computer. There is software to help generate an IUPAC name. But my, the results sure looks complicated, the process is opaque (to non-experts and even non-specialists in a domain) and there's the fine print that "from time-to-time" some compounds can't be named because "some classes of compounds may not yet have systematic nomenclature definitions available."

The names look complicated in part because they derive from a system originally designed to be pronouncable and to reflect the way that a chemist understands the system. The result is a name like (from the ACD/Name example on that ACD/Labs link -- it's got cool mouseovers!):

(2S,3R,6R,7S)-7-amino-3-[(1Z)-2-methylbut-1-en-1-yl]-8-oxo-5-thia-1-azabicyclo[4.2.0]octane-2-carboxylic acid
There's a mouthful for you.

That just doesn't seem elegant. Surely there must be a cleaner way to name a molecule.

Read: Naming molecules

Topic: Infinite Python Data Structures Previous Topic   Next Topic Topic: Repairing MetaKit databases

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use