Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Wikipedia:Wikipedia Signpost/2009-05-18/Chemistry data

From Wikipedia, the free encyclopedia
<Wikipedia:Wikipedia Signpost |2009-05-18
The Signpost

Chemistry data

WikiChemists and Chemical Abstracts announce collaboration

Contribute  —  
Share this
ByPhyschim62
Vinegar like most people have never seen it – crystals of pureacetic acid.

Chemicals – love 'em or hate 'em, but you couldn't live without 'em.Water,glucose andsodium chloride are pretty essential for all of us. Other chemicals make our clothes, or colour them, or provide jobs for millions of workers. Still more chemicals (sometimes the same ones as before) can do us some pretty nasty harm if we're not careful, or degrade the environment for our children and grandchildren. Some chemicals are only really of interest to a professional chemist, or to someone who is easily amused by silly names –arsole,moronic acid andvomitoxin all have Wikipedia articles, after all. But information about chemical compounds is big business, worth several billions of U.S. dollars annually worldwide, and theChemical Abstracts Service (CAS, a division of theAmerican Chemical Society) is a leading player, "the global leader" asthey prefer to put it!

But if there'ssomething interesting that can be said about a chemical, then sooner or latersomeone is going to write an article about it… That's why theChemicals WikiProject slaves away at over five thousand articles about individual chemical compounds (nearly double that number if you count all the drugs), trying to improve the content that we have and to fill in any obvious gaps in our coverage. Five to ten thousand compounds is pretty small compared to other collections of chemical information (CAS boasts 46 million compounds) but, as in many other areas, Wikipedia stands out because of its slightly idiosyncratic choice of subject matter. Free-access databases such asChemSpider have millions of entries but, as its CEO Antony Williams told us, "for some of those compounds, Wikipedia is theonly accessible online source of data."

Most chemicals are white powders. Most chemicals that aren't white powders are black or grey powders.Sodium dichromate is an orange powder (andcarcinogenic): it's also one of the chemicals in the collaboration between WP Chem and CAS (number [10588-01-9]).

Getting the numbers right

Of course, none of that is worth anything unless the information is reasonably accurate. Water boils at 100 °C (212 °F) and, if you say it boils at 212 °C (100 °F) you're nearer to an 'F-grade' than a 'C-grade'… About eighteen months ago, after launching aninformal survey of chemical information professionals, WP Chem embarked on amammoth project (still ongoing) of hand-checking certain types of data in theinfoboxes. Obviously, the work would be wasted without a clear record of what had been checked and what the correct data was, so WikiChemist and administratorBeetstra wroteCheMoBot, which logs any changes to data in the infoboxes and highlights changes to verified data, all with a feed to theWP Chem IRC channel on freenode (join us forpublically logged meetings most Tuesdays at 1600 UTC).

One of the types of data we wanted to verify was theCAS registry number, a sort of ID number for chemical compounds. CAS registry numbers can be found from a wide variety of sources, but the sources often don't agree with one another. The ideal solution would be to check with CAS, the organisation that issues them, and a couple of editors with access to the relevant databases offered to run some checks in their spare time.

The rapid response from CAS, and the controversy it caused in the wider chemical community, at least proved to us that chemical information professionalsreally do read Wikipedia. The first response from CAS was that anyone using its databases to find information for Wikipedia is breaching its terms of access[1] (the databases are not public). After a hectic week of emails and posts on various blogs and mailing lists (and Wikipedia talkpages) – many thanks to all those chemists who are not involved with Wikipedia but who still stood up for the project – the door was open: CAS were more than willing to help WP Chem, but we needed to agree on how.[2]

The credit for keeping the negotiations moving forward, for calmly explaining to people on each side that the other side couldn't do a deal without this or that (and, most importantly, why), that is all due to WP Chem editorWalkerma. It took a long time, but by last Autumn the talking was mostly over and the hard work could begin.

CAS has provided the WikiChemists with over six thousand CAS registry numbers from the compoundsthey consider are the most interesting to the chemical community as a whole (mostly compounds that have had more than 1000 scientific papers written about them), along with the other information we need such asstructure diagrams (in theright format) and their version of the chemical name (CAS uses its own chemical nomenclature system).ChemSpider stepped in and generatedInternational Chemical Identifiers (InChIs, another widely used ID system for some types of compound) for each of the compounds and added them to the dataset. And a committed group of editors has been working through the list one-by-one checking the data in the Wikipedia articles. If the basic data has been checked – that is, if the article really is about the compound it says it is – the CAS registry number appears bolded in the infobox.

More importantly, CAS hasjust released the data on a dedicated website,commonchemistry.org so that anyone can access it, not just WP Chem editors. This was an important condition for WikiChemists, as the data has to beverifiable, but it is a completely new departure for CAS, who have built the site from scratch. The site is not meant to be static, and more information and compounds should be added in the future. For the moment, WP Chem is still digesting the data we've got, but we've already freed that data for anyone else to use if they wish.

Looking to the future, looking for the structures

It may seem strange to go to all this trouble over some strange numbers that are meaningless to most people (human chemists included). However, chemical identifiers (of which CAS registry numbers are only one) are the key to finding and classifying chemical information on the internet. Much of that information is graphical, yet the vast majority of chemical images online are completely meaningless to a non-human. The two structures shown here arecodeine (the active ingredient in many cough medicines, left) andheroin (an illegal narcotic, right): if you can't tell the difference, then neither can a computer. The relevant chemical data is used by thesoftware that creates the image, but is thrown away when the image is saved in a browser-compatible format because there's no generally accepted standard for chemicalmetadata. WikiChemists have had discussions with external partners on ways to solve the problem, not just for single molecules but also for reaction schemes. For the moment, we're still talking, but maybe we'll have another dispatch this time next year…

Is it really WP Chem's business to be doing all this? Shouldn't we be chalking up little gold stars instead? Well there's certainly nothing wrong with writing great articles, but neither is there anything wrong with Wikipedians playing an active role in the wider intellectual community. Our outside contacts have been wonderful sources of advice to prevent WP Chem from trying to reinvent the wheel or from wasting time on information that very few people want or need. We also have a vested interest in solutions that arefree, especially faced with the giants of the chemical information business: free so that Wikipedia can benefit from them and free so thateverybody can benefit from them.

References

External links

+ Add a comment

Discuss this story

These comments are automaticallytranscluded from this article'stalk page. To follow comments,add the page to your watchlist. If your comment has not appeared here, you can trypurging the cache.
Physchim62, I like this piece and it will be great for the Signpost. I have a couple suggestions/comments. The title doesn't give much indication of what the article is about, and it actually doesn't become clear what the point is until quite a ways in. Some front-loading of the main idea would be appreciated by readers, I think. The point of codeine and heroin images will be lost on many readers unless they have explanatory captions.--ragesoss (talk)01:34, 18 May 2009 (UTC)[reply]
OK, I think I've covered those points. The last sentence probably needs repolishing after the changes, but I'm going to have to get some sleep soon!Physchim62(talk)02:17, 18 May 2009 (UTC)[reply]
Cool. Thanks.--ragesoss (talk)02:57, 18 May 2009 (UTC)[reply]
I really enjoyed this article/essay. Nice job! -BanyanTree03:22, 21 May 2009 (UTC)[reply]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_Signpost/2009-05-18/Chemistry_data&oldid=1193862465"
Category:

[8]ページ先頭

©2009-2025 Movatter.jp