my RIF estimator, i need ideassssss#31515

Unanswered

GiulioSurya asked this question inQ&A

GiulioSurya

Jun 10, 2025

· 0 comments

Return to top

Discussion options

GiulioSurya
Jun 10, 2025

Hello everyone,

As part of my Master's thesis, I am developing a new estimator based on Isolation Forest that operates on residuals. Without delving into the theoretical background, which isn't relevant here, I'm currently facing a technical issue.

My repository is available at:
(Rif estimator)

The repository includes two modules:

RIF
_residual_gen

The estimator is implemented within the scikit-learn ecosystem and therefore inherits its methods. In particular, here is what happens:

When I call thefit method on theRIF estimator, it internally invokesfit_transform from_residual_gen, which is responsible for computing residuals and using them to fit the Isolation Forest.
These residuals are computed using a Random Forest model. To avoid data leakage, they are calculated either without-of-bag (OOB) predictions ork-fold cross-validation. (There’s also a “vanilla” version without leakage control, but that’s not relevant for this issue.)

Once computed, the residuals are cached. Why?
Because whenRIF.predict(X) is called:

If the inputX is the same as the one used inRIF.fit(X), the cached residuals are reused.
If the inputX is different, the previously fitted Random Forest is used to compute new residuals, and anomalies are detected on these.

Currently, this distinction between training and prediction data is handled usingid(X), which checks whether the memory reference of the two datasets is the same. I also tried using a hash of the dataset content, but both approaches seem fragile and not robust in practice.

I’m looking for a better solution, either one that improves the logic of comparing the two datasets, or a new approach that achieves the same goal in a more reliable way.

Any help or suggestions would be greatly appreciated.

Best regards,
Giulio

You must be logged in to vote

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

my RIF estimator, i need ideassssss#31515

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

GiulioSurya
Jun 10, 2025

Replies: 0 comments

Select a reply

Uh oh!

Movatterモバイル変換

Uh oh!

my RIF estimator, i need ideassssss#31515

Uh oh!

Uh oh!

GiulioSuryaJun 10, 2025

Replies: 0 comments

Uh oh!

GiulioSurya
Jun 10, 2025