- Notifications
You must be signed in to change notification settings - Fork45
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
License
analysiscenter/batchflow
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
BatchFlow helps you conveniently work with random or sequential batches of your dataand define data processing and machine learning workflows even for datasets that do not fit into memory.
For more details seethe documentation and tutorials.
Main features:
- flexible batch generaton
- deterministic and stochastic pipelines
- datasets and pipelines joins and merges
- data processing actions
- flexible model configuration
- within batch parallelism
- batch prefetching
- ready to use ML models and proven NN architectures
- convenient layers and helper functions to build custom models
- a powerful research engine with parallel model training and extended experiment logging.
my_workflow=my_dataset.pipeline() .load('/some/path') .do_something() .do_something_else() .some_additional_action() .save('/to/other/path')
The trick here is that all the processing actions are lazy. They are not executed until their results are needed, e.g. when you request a preprocessed batch:
my_workflow.run(BATCH_SIZE,shuffle=True,n_epochs=5)
or
forbatchinmy_workflow.gen_batch(BATCH_SIZE,shuffle=True,n_epochs=5):# only now the actions are fired and data is being changed with the workflow defined earlier# actions are executed one by one and here you get a fully processed batch
or
NUM_ITERS=1000foriinrange(NUM_ITERS):processed_batch=my_workflow.next_batch(BATCH_SIZE,shuffle=True,n_epochs=None)# only now the actions are fired and data is changed with the workflow defined earlier# actions are executed one by one and here you get a fully processed batch
BatchFlow includes ready-to-use proven architectures like VGG, Inception, ResNet and many others.To apply them to your data just choose a model, specify the inputs (like the number of classes or images shape)and calltrain_model. Of course, you can also choose a loss function, an optimizer and many other parameters, if you want.
frombatchflow.models.torchimportResNet34my_workflow=my_dataset.pipeline() .init_model('model',ResNet34,config={'loss':'ce','classes':10}) .load('/some/path') .some_transform() .another_transform() .train_model('ResNet34',inputs=B.images,targets=B.labels) .run(BATCH_SIZE,shuffle=True)
For more advanced cases and detailed API seethe documentation.
BatchFlowmodule is in the beta stage. Your suggestions and improvements are very welcome.
BatchFlowsupports Python 3.9 or higher.
Withuv
uv add batchflowWithpoetry
poetry add batchflowWith old-fashionedpip
pip3 install batchflowWithuv
git clone --branch my_branch https://github.com/analysiscenter/batchflowuv add --editable ./batchflowYou can skip--branch if you needmaster.
Withpoetry
poetry add --editable git+https://github.com/analysiscenter/batchflow#my_branchWith old-fashionedpip
git clone --branch my_branch https://github.com/analysiscenter/batchflowpip install --editable ./batchflowSomebatchflow functions and classed require additional dependencies.In order to use that functionality you might need to installbatchflow with extras (e.g.batchflow[nn]):
- image - working with image datasets and plotting
- nn - for neural networks (includes torch, torchvision, ...)
- datasets - loading standard datasets (MNIST, CIFAR, ...)
- profile - performance profiling
- jupyter - utility functions for notebooks
- research - multiprocess research
- telegram - for monitoring pipelines via a telegram bot
- dev - batchflow development (ruff, pytest, ...)
You can install several extras at once, likebatchflow[image,nn,research].
- SeismiQB - ML for seismic interpretation
- SeismicPro - ML for seismic processing
- PyDEns - DL Solver for ODE and PDE
- RadIO - ML for CT imaging
- CardIO - ML for heart signals
Please cite BatchFlow in your publications if it helps your research.
Roman Khudorozhkov et al. BatchFlow library for fast ML workflows. 2017. doi:10.5281/zenodo.1041203@misc{roman_kh_2017_1041203, author = {Khudorozhkov, Roman and others}, title = {BatchFlow library for fast ML workflows}, year = 2017, doi = {10.5281/zenodo.1041203}, url = {https://doi.org/10.5281/zenodo.1041203}}About
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.