Margaret Oakley Dayhoff | |
|---|---|
| Born | Margaret Belle Oakley (1925-03-11)March 11, 1925 |
| Died | February 5, 1983(1983-02-05) (aged 57) |
| Education | Columbia University (PhD) |
| Known for | Substitution matrices one-letter code |
| Children | 2, includingRuth Dayhoff |
| Scientific career | |
| Fields | Bioinformatics |
| Institutions | University of Maryland,Rockefeller Institute,Georgetown University Medical Center,National Biomedical Research Foundation |
| Doctoral advisor | Duncan A. MacInnes |
Margaret Belle (Oakley)Dayhoff (March 11, 1925 – February 5, 1983) was an AmericanBiophysicist and a pioneer in the field ofbioinformatics.[1] Dayhoff was a professor atGeorgetown University Medical Center and a noted research biochemist at theNational Biomedical Research Foundation, where she pioneered the application of mathematics and computational methods to the field of biochemistry. She dedicated her career to applying the evolving computational technologies to support advances in biology and medicine, most notably the creation of protein and nucleic acid databases and tools to interrogate the databases. She originated one of the firstsubstitution matrices,point accepted mutations (PAM). Theone-letter code used for amino acids was developed by her, reflecting an attempt to reduce the size of the data files used to describe amino acid sequences in an era of punch-card computing.
Her PhD degree was fromColumbia University in the department of chemistry, where she devised computational methods to calculatemolecular resonance energies of several organic compounds. She did postdoctoral studies at the Rockefeller Institute (nowRockefeller University) and theUniversity of Maryland, and joined the newly establishedNational Biomedical Research Foundation in 1959. She was the first woman to hold office in theBiophysical Society and the first person to serve as both secretary and eventually president.[2]

Dayhoff was born an only child inPhiladelphia, but moved toNew York City when she was ten.[3] Her academic promise was evident from the outset – she wasvaledictorian (class of 1942) atBayside High School,Bayside, New York, and from there received a scholarship toWashington Square College ofNew York University, graduating magna cum laude in mathematics in 1945 and getting elected toPhi Beta Kappa.[4][5]
Dayhoff began a PhD in quantum chemistry underGeorge Kimball in theColumbia University Department of Chemistry. In her graduate thesis, Dayhoff pioneered the use of computer capabilities – i.e. mass-data processing – to theoretical chemistry; specifically, she devised a method of applying punched-card business machines to calculate theresonance energies of severalpolycyclic organic molecules. Her management of her research data was so impressive that she was awarded a Watson Computing Laboratory Fellowship. As part of this award, she received access to "cutting-edge IBM electronic data processing equipment" at the lab.[6][7]

After completing her PhD, Dayhoff studiedelectrochemistry underDuncan A. MacInnes at theRockefeller Institute from 1948 to 1951. In 1952, she moved to Maryland with her family and later received research fellowships from theUniversity of Maryland (1957–1959), working on a model ofchemical bonding withEllis Lippincott. At Maryland, she gained her first exposure to a new high-speed computer, the IBM model 7094. After this ended, she joined the National Biomedical Research Foundation in 1960 as associate director (a position she held for 21 years).[5] At the NBRF, she began to work withRobert Ledley, a dentist who had obtained a degree in physics and become interested in the possibilities of applying computational resources to biomedical problems. He had authored one of the earliest studies of biomedical computation, "Report on the Use of Computer in Biology and Medicine."[8] With their combined expertise, they published a paper in 1962 entitled "COMPROTEIN: A computer program to aid primary protein structure determination" that described a "completed computer program for theIBM 7090" that aimed to convert peptide digests to protein chain data. They actually began this work in 1958, but were not able to start programming until late 1960.[8]

In the early 1960s, Dayhoff also collaborated withEllis Lippincott andCarl Sagan to develop thermodynamic models of cosmo-chemical systems, including prebiological planetary atmospheres. She developed a computer program that could calculate equilibrium concentrations of the gases in a planetary atmosphere, enabling the study of the atmospheres of Venus, Jupiter, and Mars, in addition to the present day atmosphere and the primordial terrestrial atmosphere. Using this program, she considered whether the primordial atmosphere had the conditions necessary to generate life. Although she found that numerous small biologically important compounds can appear with no special nonequilibrium mechanism to explain their presence, there were compounds necessary to life that were scarce in the equilibrium model (such as ribose, adenine, and cytosine).[2]
Dayhoff also taught physiology and biophysics atGeorgetown University Medical Center for 13 years, served as a Fellow of theAmerican Association for the Advancement of Science and was elected councillor of the International Society for the Study of the Origins of Life in 1980 after 8 years of membership. Dayhoff also served on the editorial boards of three journals:DNA,Journal of Molecular Evolution andComputers in Biology and Medicine.[2]

In 1966, Dayhoff pioneered the use of computers in comparing protein sequences and reconstructing their evolutionary histories fromsequence alignments. To perform this work, she created the single-letteramino acid code to minimize the data file size for each sequence. This work, co-authored with Richard Eck, was the first application of computers to infer phylogenies from molecular sequences. It was the first reconstruction of aphylogeny (evolutionary tree) by computers from molecular sequences using amaximum parsimony method. In later years, she applied these methods to study a number of molecular relationships, such as the catalytic chain and bovine cyclic AMP-dependent protein kinase and the src gene product of Rous avian and Moloney murine sarcoma viruses; antithrombin-III, alpha-antitrypsin, and ovalbumin; epidermal growth factor and the light chain of coagulation factor X; and apolipoproteins A-I, A-II, C-I and C-III.[2]
Based on this work, Dayhoff and her coworkers developed a set of substitution matrices called thePAM (Accepted Point Mutation), MDM (Mutation Data Matrix), or Dayhoff Matrix. They are derived from global alignments of closely related protein sequences. The identification number included with the matrix (ex. PAM40, PAM100) refers to the evolutionary distance; greater numbers correspond to greater distances. Matrices using greater evolutionary distances are extrapolated from those used for lesser ones.[9] To produce a Dayhoff matrix, pairs of aligned amino acids in verified alignments are used to build a count matrix, which is then used to estimate at mutation matrix at 1 PAM (considered an evolutionary unit). From this mutation matrix, a Dayhoff scoring matrix may be constructed. Along with a model of indel events, alignments generated by these methods can be used in an iterative process to construct new count matrices until convergence.[10]
One of Dayhoff's most important contributions to bioinformatics was herAtlas of Protein Sequence and Structure, a book reporting all known protein sequences (totaling 65) that she published in 1965.[11] This book published a degenerate encoding of amino acids. It was subsequently republished in several editions. This led to theProtein Information Resource database of protein sequences, the first online database system that could be accessed by telephone line and available for interrogation by remote computers.[12] The book has since been cited nearly 4,500 times.[2] It and the parallel effort byWalter Goad which led to theGenBank database of nucleic acid sequences are the twin origins of the modern databases of molecular sequences. TheAtlas was organized bygene families, and she is regarded as a pioneer in their recognition.Frederick Sanger's determination of the first complete amino acid sequence of a protein (insulin) in 1955, led a number of researchers to sequence various proteins from different species. In the early 1960s, a theory was developed that small differences between homologous protein sequences (sequences with a high likelihood of common ancestry) could indicate the process and rate of evolutionary change on the molecular level. The notion that such molecular analysis could help scientists decode evolutionary patterns in organisms was formalized in the published papers ofEmile Zuckerkandl andLinus Pauling in 1962 and 1965.
| Amino acids | 1-letter code | 3-letter code | Property | Ambiguous 1-letter code |
|---|---|---|---|---|
| Cysteine | C | Cys | Sulfur polymerization | a |
| Glycine,Serine,Threonine,Alanine,Proline | G, S, T, A, P | Gly, Ser, Thr, Ala, Pro | Small | b |
| Aspartic acid,Glutamic acid,Asparagine,Glutamine | D, E, N, Q | Asp, Glu, Asn, Gln | Acid and amide | c |
| Arginine,Histidine,Lysine | R, H, K | Arg, His, Lys | Basic | d |
| Leucine,Valine,Methionine,Isoleucine | L, V, M, I | Leu, Val, Met, Ile | Hydrophobic | e |
| Tyrosine,Phenylalanine,Tryptophan | Y*, F, W | Tyr, Phe, Trp | Aromatic | f |
The one letter code was adopted byIUPAC and remains ingeneral use. Dayhoff's ambiguous one-letter code has been superseded.
Dayhoff's husband was Edward S. Dayhoff, an experimental physicist who worked with magnetic resonance and with lasers.[14] They had two daughters who are also academics, Ruth and Judith.[15]
Judith Dayhoff has a Mathematical Biophysics PhD fromUniversity of Pennsylvania and is the author ofNeural network architectures: An introduction and coauthor ofNeural Networks and Pattern Recognition.[15][16][17][18]
Ruth Dayhoff graduated summa cum laude in Mathematics from theUniversity of Maryland and focused onMedical Informatics while doing her MD atGeorgetown University School of Medicine.[14] During medical school, she co-authored a paper and a chapter inThe Atlas of Protein Sequence and Structure with her mother, describing a new way to measure how closely proteins are related.[14] Her husband Vincent Brannigan is Professor Emeritus of Law and Technology at the University of Maryland School of Engineering. Ruth was a founding Fellow of theAmerican College of Medical Informatics. She pioneered the integration ofMedical Imaging and invented theVista Imaging System. She was chosen for theNational Library of Medicine's project on the 200 women Physicians who "changed the face of medicine."[14] She serves as director of Digital Imaging in Medicine for theUnited States Department of Veteran's Affairs.[5]
Dayhoff'sAtlas became a template for many indispensable tools in large portions of DNA or protein-related biomedical research. In spite of this significant contribution, Dayhoff was marginalized by the community of sequencers. The contract to manageGenBank (a technology directly related to her research), awarded in 1983 by the NIH, went toWalter Goad at theLos Alamos National Laboratory. The reason for this attitude was unknown, with theories ranging from sexism to a clash of values with the experimental science community.[19] Despite the success of Dayhoff'sAtlas, experimental scientists and researchers considered their sequence information very valuable and were often reluctant to submit it to such a publicly available database.[20]
During the last few years of her life, she focused on obtaining stable, adequate, long-term funding to support the maintenance and further development of herProtein Information Resource. She envisioned an online system of computer programs and databases, accessible by scientists all over the world, for identifying protein from sequence or amino acid composition data, for making predictions based on sequences, and for browsing the known information. Less than a week before she died, she submitted a proposal to the Division of Research Resources atNIH for a Protein Identification Resource. After her death, her colleagues worked to make her vision a reality, and the protein database was fully operational by the middle of 1984.[2]

Dayhoff died of a heart attack at the age of 57 on February 5, 1983.[3] A fund was established after her death in 1984 to endow theMargaret O. Dayhoff Award, one of the top national honors in biophysics. The award is presented to a woman who "holds very high promise or has achieved prominence while developing the early stages of a career in biophysical research within the purview and interest of the Biophysical Society."[21] It is presented at the annual meeting of the Biophysical Society and includes an honorarium of $2,000.
She was survived by her husband, Edward S. Dayhoff of Silver Spring; two daughters,Ruth E. Dayhoff Brannigan of College Park, and Judith E. Dayhoff of Silver Spring, and her father, Kenneth W. Oakley of Silver Spring.[5]
David Lipman, director of theNational Center for Biotechnology Information, has called Dayhoff the "mother and father of bioinformatics".[22]
Her seminal contributions as the mother of the science of bioinformatics, now routinely used as part of the process for naming bacteria, were acknowledged with a bacterium being named after her in 2020,Enemella dayhoffiae.[23]