This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: What's the name of my field?
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I say that I work in "cheminformatics." Others use
"chemoinformatics". What's going on?
When I entered this field in the late 1990s I said "chemical
informatics" and the main journal was the "Journal of Chemical
Information and Computer Sciences." Before 1975 it was the "Journal of
Chemical Documentation" and in 2005 it became "Journal of Chemical
Information and Modeling". You also see the older term in older
company names. Daylight's full corporate name, for example, is
"Daylight Chemical Information Systems."
Cheminformatics? Chemoinformatics?
"Chemical informatics" is no longer so common. Which of the two newer
names do people use? There's a "Journal of Cheminformatics" but no
"Journal of Chemoinformatics." In 2006 some 100 scientist from 20
countries in Europe and North America wrote the "Obernai Declaration"
(no longer online?) to define and promote the field of
"chemoinformatics."
Cheminformatics is now (December 2009) used about 2.5-times more
frequently than chemoinformatics. In 2006 this ratio was 1.6, in 2007
1.5 and in 2008 1.9. So it looks like that the term cheminformatics is
winning the race!
While this weighs in favor of cheminformatics, it's still somewhat
suspect. Google's estimated counts are shaky and not meant as accurate
numbers. I've done queries which are estimated to have thousands of
hits, only to find that there's about 50. It's also possible that if
PubChem and ChemSpider were to put the word "cheminformatics" on every
page then it would seriously skew the total number of pages Google
finds.
More specifically, if you search for "cheminformatics" you'll see
"About 6,960,000 results". Try to go to item 900 and you'll get the
message:
In order to show you the most relevant results, we have omitted some
entries very similar to the 654 already displayed.
"Chemoinformatics" returns "About 96,500 results". Almost an order of
magnitude less! But try going to the end of those and you'll see:
In order to show you the most relevant results, we have omitted some
entries very similar to the 671 already displayed.
Shaky indeed!
The other day, Google Labs released the Books Ngram Viewer, which
lets you view and compare the publish rate of word use over time.
This gives a different way to compare those three rather distinct
terms. Here's the results:
You can clearly see that "chemical informatics" was in the lead when I
started, with a time when "chemoinformatics" dominated, and now
"cheminformatics" is pulling ahead. If only those graphs could be
extended to the present!
These numbers likely have their own bias. I assume Google made no
mistakes so the years and counts are correct. Book authors are more
likely to have been in the field for a long time, so perhaps they
collectively lag or lead the trends, or perhaps there are several
prolific authors who simply prefer to use minority
terminology. There's also likely few books involved, making for large
error estimates, and I use Google's results as a proxy for the number
of different books.
In other words, this pretty plot isn't confirmation, only data. But
it's data which reinforces the growing belief that "cheminformatics"
is the dominate term and it's data which corraborates my understanding
of the history.