- Notifications
You must be signed in to change notification settings - Fork22
Blazing-Fast Bioinformatic Operations on Python DataFrames
License
NotificationsYou must be signed in to change notification settings
biodatageeks/polars-bio
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
polars-bio is a Python library for genomics built on top ofpolars,Apache Arrow andApache DataFusion.It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.
- optimized forperformance and memoryefficiency for large-scale genomics datasets analyses both when reading input data and performing operations
- popular genomicsoperations with a DataFrame API (bothPandas andpolars)
- SQL-powered bioinformatic data querying or manipulation
- native parallel engine powered by Apache DataFusion andsequila-native
- out-of-core/streaming processing (for data too large to fit into a computer's main memory) withApache DataFusion andpolars
- support forfederated andstreamed reading data fromcloud storages (e.g. S3, GCS) withApache OpenDAL enabling processing large-scale genomics data without materializing in memory
- zero-copy data exchange withApache Arrow
- bioinformatics fileformats withnoodles andexon
- fast overlap operations withCOITrees: Cache Oblivious Interval Trees
- pre-built wheel packages forLinux,Windows andMacOS (arm64 andx86_64) available onPyPI
If you usepolars-bio in your work, please cite:
@article {Wiewiorka2025.03.21.644629,author ={Wiewiorka, Marek and Khamutou, Pavel and Zbysinski, Marek and Gambin, Tomasz},title ={polars-bio - fast, scalable and out-of-core operations on large genomic interval datasets},elocation-id ={2025.03.21.644629},year ={2025},doi ={10.1101/2025.03.21.644629},publisher ={Cold Spring Harbor Laboratory},URL ={https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629},eprint ={https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629.full.pdf},journal ={bioRxiv}}
Read thedocumentation
About
Blazing-Fast Bioinformatic Operations on Python DataFrames
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.