This articlemay incorporate text from alarge language model. It may includehallucinated information,copyright violations, claims notverified in cited sources,original research, orfictitious references. Any such material should beremoved, and content with anunencyclopedic tone should be rewritten.(October 2025) (Learn how and when to remove this message) |

Computational biology refers to the use of techniques incomputer science,data analysis,mathematical modeling andcomputational simulations to understandbiological systems and relationships.[1] An intersection ofcomputer science,biology, anddata science, the field also has foundations inapplied mathematics,molecular biology,cell biology,chemistry, andgenetics.[2]
Bioinformatics, the analysis of informatics processes inbiological systems, began in the early 1970s. At this time, research inartificial intelligence was usingnetwork models of the human brain in order to generate newalgorithms. This use ofbiological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field.[3]
By 1982, researchers shared information viapunch cards. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.[3]
Perhaps the best-known example of computational biology, theHuman Genome Project, officially began in 1990.[4] By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals.[5] Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues.[6][7] The missing Ychromosome was added in January 2022.
Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields.[8] Today, theInternational Society for Computational Biology recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field.[9] In addition to helping sequence the human genome, computational biology has helped create accuratemodels of thehuman brain,map the 3D structure of genomes, and model biological systems.[3] Much of the original progress in computational biology emerged from theUnited States andWestern Europe, due to their large computational infrastructures. Recent decades have seen growing contributions from less-wealthy nations, however. For example,Colombia has had an international computational biology effort since 1998, focusing on genomics and disease in nationally-important crops likecoffee andpotatoes.[10]Poland, similarly, has recently been a leader in biomolecular simulations and macromolecular sequence analysis.[11]
Computational anatomy is the study of anatomical shape and form at the visible orgross anatomical scale ofmorphology. It involves the development of computational mathematical and data-analytical methods for modeling and simulating biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such asmagnetic resonance imaging, computational anatomy has emerged as a subfield ofmedical imaging andbioengineering for extracting anatomical coordinate systems at the morpheme scale in 3D.
The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations.[12] Thediffeomorphism group is used to study different coordinate systems viacoordinate transformations as generated via theLagrangian and Eulerian velocities of flow from one anatomical configuration in to another. It relates withshape statistics andmorphometrics, with the distinction thatdiffeomorphisms are used to map coordinate systems, whose study is known as diffeomorphometry.
Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior inbiological systems. This entails a more theoretical approach to problems, rather than its more empirically minded counterpart ofexperimental biology.[13] Mathematical biology draws ondiscrete mathematics,topology (also useful for computational modeling),Bayesian statistics,linear algebra andBoolean algebra.[14]
These mathematical approaches have enabled the creation ofdatabases and other methods for storing, retrieving, and analyzing biological data, a field known asbioinformatics. Usually, this process involvesgenetics and analyzinggenes.
Gathering and analyzing large datasets have made room for growing research fields such asdata mining,[14] and computational biomodeling, which refers to buildingcomputer models andvisual simulations of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations".[15] While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and genetherapy.[15] A useful modeling approach is to usePetri nets via tools such asesyN.[16]
Along similar lines, until recent decadestheoretical ecology has largely dealt withanalytic models that were detached from thestatistical models used byempirical ecologists. However, computational methods have aided in developing ecological theory viasimulation of ecological systems, in addition to increasing application of methods fromcomputational statistics in ecological analyses.
Systems biology consists of computing the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. This process usually involves networkingcell signaling andmetabolic pathways. Systems biology often uses computational techniques from biological modeling andgraph theory to study these complex interactions at cellular levels.[14]
Computational biology has assisted evolutionary biology by:

Computational genomics is the study of thegenomes ofcells andorganisms. TheHuman Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individualpatient.[18] This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. Researchers are looking to sequence the genomes of animals, plants,bacteria, and all other types of life.[19]
One of the main ways that genomes are compared is bysequence homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a commonancestor. Research suggests that between 80 and 90% of genes in newly sequencedprokaryotic genomes can be identified this way.[19]
Sequence alignment is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing thelongest common subsequence of two genes or comparing variants of certaindiseases.[citation needed]
An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome.[19] Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such asENCODE and theRoadmap Epigenomics Project.
Understanding how individualgenes contribute to thebiology of an organism at themolecular,cellular, and organism levels is known asgene ontology. TheGene Ontology Consortium's mission is to develop an up-to-date, comprehensive, computational model ofbiological systems, from the molecular level to larger pathways, cellular, and organism-level systems. The Gene Ontology resource provides a computational representation of current scientific knowledge about the functions of genes (or, more properly, theprotein and non-codingRNA molecules produced by genes) from many different organisms, from humans to bacteria.[20]
3D genomics is a subsection in computational biology that focuses on the organization and interaction of genes within aeukaryotic cell. One method used to gather 3D genomic data is throughGenome Architecture Mapping (GAM). GAM measures 3D distances ofchromatin and DNA in the genome by combiningcryosectioning, the process of cutting a strip from the nucleus to examine the DNA, with laser microdissection. A nuclear profile is simply this strip or slice that is taken from the nucleus. Each nuclear profile contains genomic windows, which are certain sequences ofnucleotides - the base unit of DNA. GAM captures a genome network of complex, multi enhancer chromatin contacts throughout a cell.[21]
Computational biology also plays a pivotal role in identifyingbiomarkers for diseases such as cardiovascular conditions. By integrating various 'Omic' data - such asgenomics,proteomics, andmetabolomics - researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing betweencoronary artery disease andmyocardial infarction, thereby enhancing diagnostic precision.[22]
Computationalneuroscience is the study of brain function in terms of the information processing properties of thenervous system. A subset of neuroscience, it looks to model the brain to examine specific aspects of the neurological system.[23] Models of the brain include:
It is the work of computational neuroscientists to improve thealgorithms and data structures currently used to increase the speed of such calculations.
Computationalneuropsychiatry is an emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved inmental disorders. Several initiatives have demonstrated that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.[25][26][27]
Computational pharmacology is "the study of the effects of genomic data to find links between specificgenotypes and diseases and thenscreening drug data".[28] Thepharmaceutical industry requires a shift in methods to analyze drug data. Pharmacologists were able to useMicrosoft Excel to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on aspreadsheet. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massivedata sets. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed.[29]
Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs.[29]
Computational biology plays a crucial role in discovering signs of new, previously unknown living creatures and incancer research. This field involves large-scale measurements of cellular processes, includingRNA,DNA, and proteins, which pose significant computational challenges. To overcome these, biologists rely on computational tools to accurately measure and analyze biological data.[30] In cancer research, computational biology aids in the complex analysis oftumor samples, helping researchers develop new ways to characterize tumors and understand various cellular properties. The use of high-throughput measurements, involving millions of data points from DNA, RNA, and other biological structures, helps in diagnosing cancer at early stages and in understanding the key factors that contribute to cancer development. Areas of focus include analyzing molecules that are deterministic in causing cancer and understanding how the human genome relates to tumor causation.[30][31]
Computational toxicology is a multidisciplinary area of study, which is employed in the early stages of drug discovery and development to predict the safety and potential toxicity of drug candidates.
A growing application of computational biology isdrug discovery. For example, simulations ofintracellular andintercellular signaling events, using data from proteomic or metabolomic experiments, may reduce dependence on experimentation in elucidatingpharmacokinetics andpharmacodynamics of drug candidates in living organisms.[32]
Increasingly, artificial intelligence plays a central role in the drug discovery process. Using chemical structures of known pharmaceutical agents as inputs, AI models can suggest structures of lead compounds or predict novel modes of drug-protein binding. AI is also used forvirtual screening of candidate molecules, avoiding the need to synthesize large numbers of molecules for screening.[33][34]
Computational biologists use a wide range of software and algorithms to carry out their research.
Unsupervised learning is a type of algorithm that finds patterns in unlabeled data. One example isk-means clustering, which aims to partitionn data points intok clusters, in which each data point belongs to the cluster with the nearest mean. Another version is thek-medoids algorithm, which, when selecting a cluster center or cluster centroid, will pick one of its data points in the set, and not just an average of the cluster.

The algorithm follows these steps:
One example of this in biology is used in the 3D mapping of a genome. Information of a mouse's HIST1 region of chromosome 13 is gathered fromGene Expression Omnibus.[35] This information contains data on which nuclear profiles show up in certain genomic regions. With this information, theJaccard distance can be used to find a normalized distance between all the loci.
Graph analytics, ornetwork analysis, is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such asprotein-protein interaction networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking atcentrality in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding which nodes are most important. For example, given data on the activity of genes over a time period, degree centrality can be used to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This contributes to the understanding of the roles certain genes play in the network.
There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.[36]
Supervised learning is a type of algorithm that learns from labeled data and learns how to assign labels to future data that is unlabeled. In biology supervised learning can be helpful when we have data that we know how to categorize and we would like to categorize more data into those categories.

A common supervised learning algorithm is therandom forest, which uses numerousdecision trees to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as aclassification tree, but if the target variable is continuous then it is called aregression tree. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.[citation needed]
Open source software provides a platform for computational biology where everyone can access and benefit from software developed in research.[37]PLOS cites four main reasons for the use of open source software:
There are several large conferences that are concerned with computational biology. Some notable examples areIntelligent Systems for Molecular Biology,European Conference on Computational Biology andResearch in Computational Molecular Biology.
There are also numerous journals dedicated to computational biology. Some notable examples includeJournal of Computational Biology andPLOS Computational Biology, a peer-reviewedopen access journal that has many notable research projects in the field of computational biology. They provide reviews onsoftware, tutorials for open source software, and display information on upcoming computational biology conferences.[citation needed] Other journals relevant to this field includeBioinformatics,Computers in Biology and Medicine,BMC Bioinformatics,Nature Methods,Nature Communications,Scientific Reports,PLOS One, etc.
Computational biology,bioinformatics andmathematical biology are all interdisciplinary approaches to thelife sciences that draw from quantitative disciplines such as mathematics andinformation science. TheNIH describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data.[1]
Specifically, the NIH defines
Computational biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.[1]
Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.[1]
While each field is distinct, there may be significant overlap at their interface,[1] so much so that to many, bioinformatics and computational biology are terms that are used interchangeably.
The terms computational biology andevolutionary computation appear similar but are not identical. Evolutionary computation is a field of computer science comprising algorithms inspired by evolution in biology. Algorithms from within the field of evolutionary computation can be applied to computational biology.[39]