Comprehensive Codon Usage Bias Analysis in R
Codon usage bias refers to the non-uniform usage of synonymous codons(codons that encode the same amino acid) across different organisms,genes, and functional categories.cubar is acomprehensive R package for analyzing codon usage bias in codingsequences. It provides a unified framework for calculating establishedcodon usage metrics, conducting sliding-window analyses or differentialusage analyses, and optimizing sequences for heterologousexpression.
Biostrings anddata.table backendsInstall the latest stable version from CRAN:
install.packages("cubar")Install the latest development version from GitHub:
# Install devtools if not already installedif (!requireNamespace("devtools",quietly =TRUE)) {install.packages("devtools")}# Install cubar from GitHubdevtools::install_github("mt1022/cubar",dependencies =TRUE)System Requirements: - R (≥ 4.1.0)
Required Packages: -Biostrings (≥2.60.0) - Bioconductor package for sequence manipulation -IRanges (≥ 2.34.0) - Bioconductor infrastructure for rangeoperations
-data.table (≥ 1.14.0) - High-performance datamanipulation -ggplot2 (≥ 3.3.5) - Data visualization -rlang (≥ 0.4.11) - Language tools
Note: Bioconductor packages will be installedautomatically, but you may need to update your R installation if youencounter compatibility issues.
📖Complete documentation is available within R(?function_name) and on ourpackagewebsite.
Here’s a typical analysis workflow demonstrating keyfunctionality:
library(cubar)library(ggplot2)# 1. Load and quality-check sequencesdata(yeast_cds)clean_cds<-check_cds(yeast_cds)# 2. Calculate codon frequenciescodon_freq<-count_codons(clean_cds)# 3. Calculate multiple metricsenc<-get_enc(codon_freq)# Effective number of codonsgc3s<-get_gc3s(codon_freq)# GC content at 3rd positions# 4. Analyze highly expressed genesdata(yeast_exp)yeast_exp<- yeast_exp[yeast_exp$gene_id%in%rownames(codon_freq), ]high_expr<-head(yeast_exp[order(-yeast_exp$fpkm), ],500)rscu_high<-est_rscu(codon_freq[high_expr$gene_id, ])cai<-get_cai(codon_freq, rscu_high)# 5. Visualize resultsdf<-data.frame(ENC = enc,CAI = cai,GC3s = gc3s)ggplot(df,aes(color = GC3s,x = ENC,y = CAI))+geom_point(alpha =0.6)+scale_color_viridis_c()+labs(title ="Codon Usage Bias Relationships",x ="Effective Number of Codons",y ="Codon Adaptation Index")?function_name) andonline docsFor complementary analysis, consider these R packages:
This project is licensed under the MIT License - see theLICENSE file for details.