PatrickTourniaire/rorPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Python library which provides simple interfaces to programatically create pipelines for data processing and ML, to create good seperation of concern.

License

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
docs		docs
ror		ror
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Repository files navigation

ROR

ROR is a pipelining framework for Python which makes it easier to define complex ML anddata-processing stages.

Install it from PyPI

pip install ror

Usage

To get started with creating your first pipeline, you can base it on this example whichdefines a simple GMM pipeline. Firstly, we import the relevant packages.

importmatplotlib.pyplotaspltfromsklearnimportdatasetsfromsklearn.mixtureimportGaussianMixturefromsklearn.decompositionimportPCAfromsklearn.preprocessingimportStandardScalerfromdataclassesimportdataclassfromtypingimportTuplefromror.schemasimportBaseSchemafromror.schemas.fieldsimportfield_perishable,field_persistancefromror.stagesimportIInitStage,ITerminalStage,IForwardStagefromror.controlersimportBaseController

Then we can define the schemas which will determine the structure of the data communicated between the different stages.

@dataclassclassInitStageInput(BaseSchema):data:object=field_perishable()@dataclassclassInitStageOutput(BaseSchema):X_pca:object=field_persistance()X_std:object=field_perishable()model:object=field_persistance()@dataclassclassInferenceStageOutput(BaseSchema):X_pca:object=field_perishable()model:object=field_perishable()labels:object=field_persistance()@dataclassclassVisStageOutput(BaseSchema):labels:object=field_persistance()

We can then define the logical stages which will be utilizing these schemas as inputand output between stages.

classVisStage(ITerminalStage[InferenceStageOutput,VisStageOutput]):defcompute(self)->None:# Visualize the clustersplt.figure(figsize=(8,6))colors= ['r','g','b']foriinrange(3):plt.scatter(self.input.X_pca[self.input.labels==i,0],self.input.X_pca[self.input.labels==i,1],color=colors[i],label=f'Cluster{i+1}'              )plt.title('Gaussian Mixture Model Clustering')plt.xlabel('Principal Component 1')plt.ylabel('Principal Component 2')plt.legend()plt.show()self._output=self.input.get_carry()defget_output(self)->VisStageOutput:returnVisStageOutput(**self._output)classInferenceStage(IForwardStage[InitStageOutput,InferenceStageOutput,VisStage]):defcompute(self)->None:# Fit Guassian mixture to datasetself.input.model.fit(self.input.X_std)# Predict the labelslabels=self.input.model.predict(self.input.X_std)self._output= {"labels":labels,**self.input.get_carry()          }defget_output(self)->Tuple[VisStage,InferenceStageOutput]:returnVisStage(),InferenceStageOutput(**self._output)classInitStage(IInitStage[InitStageInput,InitStageOutput,InferenceStage]):defcompute(self)->None:# Load the datasetX=self.input.data.data# Standardize the featuresscaler=StandardScaler()X_std=scaler.fit_transform(X)# Apply PCA to reduce dimensionality for visualizationpca=PCA(n_components=2)X_pca=pca.fit_transform(X_std)# Fit a Gaussian Mixture Modelgmm=GaussianMixture(n_components=3,random_state=42)self._output= {"X_pca":X_pca,"X_std":X_std,"model":gmm,**self.input.get_carry()          }defget_output(self)->Tuple[InferenceStage,InitStageOutput]:returnInferenceStage(),InitStageOutput(**self._output)

Then we can define a simple controller which will be given an instance of the init stage and the input data to be passed through the pipeline.

iris=datasets.load_iris()input_data=InitStageInput(data=iris)controller=BaseController(init_data=input_data,init_stage=InitStage)controller.discover()# Shows a table of the connected stagesoutput,run_id=controller.start()

And that's it! With this you can define logical processing stages for your ML inferencepipelines whilst keeping a high level of seperation.

About

Python library which provides simple interfaces to programatically create pipelines for data processing and ML, to create good seperation of concern.

patricktourniaire.github.io/ror/

Releases1

v0.1.1 Latest

Jan 13, 2024

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ROR

Install it from PyPI

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Uh oh!

Languages

Movatterモバイル変換

License

PatrickTourniaire/ror

Folders and files

Latest commit

History

Repository files navigation

ROR

Install it from PyPI

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Languages

Packages