- Notifications
You must be signed in to change notification settings - Fork35
Genomic interval operations on Pandas DataFrames
License
open2c/bioframe
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.
Bioframe is built directly on top ofPandas. Bioframe provides:
- A variety of genomic interval operations that work directly on dataframes.
- Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
- Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.
Read thedocumentation, including theguide, as well as thepublication for more information.
Bioframe is an Affiliated Project ofNumFOCUS.
Bioframe is available onPyPI andbioconda:
pip install bioframe
Interested in contributing to bioframe? That's great! To get started, check out thecontributing guide. Discussions about the project roadmap take place on theOpen2C Slack and regular developer meetings scheduled there. Anyone can join and participate!
Key genomic interval operations in bioframe include:
overlap: Find pairs of overlapping genomic intervals between two dataframes.closest: For every interval in a dataframe, find the closest intervals in a second dataframe.cluster: Group overlapping intervals in a dataframe into clusters.complement: Find genomic intervals that are not covered by any interval from a dataframe.
Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including:coverage,expand,merge,select, andsubtract.
Tooverlap two dataframes, call:
importbioframeasbfbf.overlap(df1,df2)
For these two input dataframes, with intervals all on the same chromosome:
overlap will return the following interval pairs as overlaps:
Tomerge all overlapping intervals in a dataframe, call:
importbioframeasbfbf.merge(df1)
For this input dataframe, with intervals all on the same chromosome:
merge will return a new dataframe with these merged intervals:
See theguide for visualizations of other interval operations in bioframe.
Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function isread_table which mirrors pandas’s read_csv/read_table but provides aschema argument to populate column names for common tabular file formats.
jaspar_url='http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'ctcf_motif_calls=bioframe.read_table(jaspar_url,schema='jaspar',skiprows=1)
See thisjupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.
If you usebioframe in your work, please cite:
@article{bioframe_2024,author ={Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},doi ={10.1093/bioinformatics/btae088},journal ={Bioinformatics},title ={{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},year ={2024}}
About
Genomic interval operations on Pandas DataFrames
Topics
Resources
License
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.





