Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

What is preventing sklearn to achieve true model persistence?#30609

Pierre-Bartet started this conversation inGeneral
Discussion options

What is preventingsklearn to achieve true model persistence?
For examplemodel.dump(..) +LogisticRegression.load(...) ?
All the existing solutions are brittle or force users to use exactly the samesklearn version for training and inference:
https://scikit-learn.org/1.6/model_persistence.html

I understand that this is a deliberate choice because sklearn's team lack of resources, but offloading serialization logic to external libraries can only end up in an a much worse maintenance, communication, and interdependence nightmare.

For examplesklearn-onnx accesses privatesklearn components to be able to serialize them (such asPolynomialFeatures's_min_degree, or gradient boosting's_predictors).

Covering all ofsklearn components would be a tremendous task, but it could be done step by step, and it is also somewhat parallelizable by assigning a few models to anyone who would be happy to help.

You must be logged in to vote

Replies: 2 comments 2 replies

Comment options

Basically, it is more a maintenance burden where with the team, we estimate that we could not maintain it. However, we had recent discussion in which we think that we could have a trimmed inference estimator for each estimator, reducing the impact of potential private changes that make it to update scikit-learn versions in this setting. Basically, it would make the life easier for packages assklearn-onnx.

It would be possible to working on persistence with afit + inference but the maintenance is really the bottleneck.

You must be logged in to vote
2 replies
@Pierre-Bartet
Comment options

Thanks, I understand the maintenance burden issue but right now asklearn non breaking change (such as removing one of the above private attribute) can breaksklearn-onnx or any external attempt at serializing models which seems to be an even larger burden maintenance for everyone (sklearn team included).

Your trimmed inference estimator idea is awesome !

@Pierre-Bartet
Comment options

Another path would be to "just" make sure everything necessary (but nothing more) for inference is accessible as public attributes (without creating a new class for each estimator), so that tools such assklearn-onnx can rely on something reliable (and also maybe help them reach a point weresklearn-onnx is less buggy and has more coverage, since it is still a huge task).

Comment options

I concur with you Pierre-Bartet, it should be feasible to implement model persistence as a community effort. Issue#31143 is relevant for this discussion. There is no need for deciding on a persistence format, the only requirement is that parameters/state can be retrieved from a model, as either numpy or python native data structures. And conversely, that a model can consume the same as input for initialisation.

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
3 participants
@Pierre-Bartet@glemaitre@jcbsv

[8]ページ先頭

©2009-2025 Movatter.jp