w4k2/stream-learnPublic

NotificationsYou must be signed in to change notification settings
Fork21
Star66

The stream-learn is an open-source Python library for difficult data stream analysis.

License

GPL-3.0 license

66 stars 21 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 686 Commits
.circleci		.circleci
.github/workflows		.github/workflows
.vscode		.vscode
_vapor		_vapor
conda		conda
docs		docs
plots		plots
strlearn		strlearn
.coveralls.yml		.coveralls.yml
.gitignore		.gitignore
.nojekyll		.nojekyll
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
references_ensembles.bib		references_ensembles.bib
references_evaluators.bib		references_evaluators.bib
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

stream-learn

Thestream-learn module is a set of tools necessary for processing data streams usingscikit-learn estimators. The batch processing approach is used here, where the dataset is passed to the classifier in smaller, consecutive subsets calledchunks. The module consists of five sub-modules:

streams - containing a data stream generator that allows obtaining both stationary and dynamic distributions in accordance with various types of concept drift (also in the field of a priori probability, i.e. dynamically unbalanced data) and a parser of the standard ARFF file format.
evaluators - containing classes for running experiments on stream data in accordance with the Test-Then-Train and Prequential methodology.
classifiers - containing sample stream classifiers,
ensembles - containing standard hybrid models of stream data classification,
metrics - containing typical classification quality metrics in data streams.

You can read more about each module in thedocumentation page.

Citation policy

If you use stream-learn in a scientific publication, we would appreciate citation to the following paper:

@article{Ksieniewicz2022,  doi = {10.1016/j.neucom.2021.10.120},  url = {https://doi.org/10.1016/j.neucom.2021.10.120},  year = {2022},  month = jan,  publisher = {Elsevier {BV}},  author = {P. Ksieniewicz and P. Zyblewski},  title = {stream-learn {\textemdash} open-source Python library for difficult data stream batch analysis},  journal = {Neurocomputing}}

Quick start guide

Installation

To use thestream-learn package, it will be absolutely useful to install it. Fortunately, it is available in thePyPI repository, so you may install it usingpip:

pip3 install -U stream-learn

stream-learn is also avaliable withconda:

conda install stream-learn -c w4k2 -c conda-forge

You can also install the module cloned from Github using the setup.py file if you have a strange, but perhaps legitimate need:

git clone https://github.com/w4k2/stream-learn.gitcd stream-learnmake install

Preparing experiments

1. Classifier

In order to conduct experiments, a declaration of four elements is necessary. The first is the estimator, which must be compatible with thescikit-learn API and, in addition, implement thepartial_fit() method, allowing you to re-fit the already built model. For example, we'll use the standardGaussian Naive Bayes algorithm:

fromsklearn.naive_bayesimportGaussianNBclf=GaussianNB()

2. Data Stream

The next element is the data stream that we aim to process. In the example we will use a synthetic stream consisting of shocking number of 100 chunks and containing precisely one concept drift. We will prepare it using theStreamGenerator() class of thestream-learn module:

fromstrlearn.streamsimportStreamGeneratorstream=StreamGenerator(n_chunks=100,n_drifts=1)

3. Metrics

The third requirement of the experiment is to specify the metrics used in the evaluation of the methods. In the example, we will use theaccuracy metric available inscikit-learn and theprecision from thestream-learn module:

fromsklearn.metricsimportaccuracy_scorefromstrlearn.metricsimportprecisionmetrics= [accuracy_score,precision]

4. Evaluator

The last necessary element of processing is the evaluator, i.e. the method of conducting the experiment. For example, we will choose theTest-Then-Train paradigm, described in more detail inUser Guide. It is important to note, that we need to provide the metrics that we will use in processing at the point of initializing the evaluator. In the case of none metrics given, it will use default pair ofaccuracy andbalanced accuracy scores:

fromstrlearn.evaluatorsimportTestThenTrainevaluator=TestThenTrain(metrics)

Processing and understanding results

Once all processing requirements have been met, we can proceed with the evaluation. To start processing, call the evaluator's process method, feeding it with the stream and classifier::

evaluator.process(stream,clf)

The results obtained are stored in thescores atribute of evaluator. If we print it on the screen, we may be able to observe that it is a three-dimensional numpy array with dimensions(1, 29, 2).

The first dimension is theindex of a classifier submitted for processing. In the example above, we used only one model, but it is also possible to pass a tuple or list of classifiers that will be processed in parallel (SeeUser Guide).
The second dimension specifies theinstance of evaluation, which in the case ofTest-Then-Train methodology directly means the index of the processed chunk.
The third dimension indicates themetric used in the processing.

Using this knowledge, we may finally try to illustrate the results of our simple experiment in the form of a plot::

importmatplotlib.pyplotaspltplt.figure(figsize=(6,3))form,metricinenumerate(metrics):plt.plot(evaluator.scores[0, :,m],label=metric.__name__)plt.title("Basic example of stream processing")plt.ylim(0,1)plt.ylabel('Quality')plt.xlabel('Chunk')plt.legend()

About

The stream-learn is an open-source Python library for difficult data stream analysis.

stream-learn.readthedocs.io

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

stream-learn

Citation policy

Quick start guide

Installation

Preparing experiments

1. Classifier

2. Data Stream

3. Metrics

4. Evaluator

Processing and understanding results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Uh oh!

Contributors7

Uh oh!

Languages

Movatterモバイル変換

License

w4k2/stream-learn

Folders and files

Latest commit

History

Repository files navigation

stream-learn

Citation policy

Quick start guide

Installation

Preparing experiments

1. Classifier

2. Data Stream

3. Metrics

4. Evaluator

Processing and understanding results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

Packages