Added support of incremental algorithms
Added config example for introduced functionality
Fixed bug in report generator which led to fail in case if one of estimator attribute is not hashable
Fixed warning in report generator which appeared in geomean calculation in case if Dataframe is empty.
Introducenum_batches parameter that determines number of calls of partial fit on dataset

olegkkruglov requested a review fromAlexsandruss as acode owner

September 20, 2024 00:16

olegkkruglov requested review fromethanglaser andmd-shafiul-alam

September 20, 2024 00:18

samir-nasibli changed the title~~Add incremental algorithms support~~ENH: Add incremental algorithms support

Sep 20, 2024

Add incremental algorithms support

535c1e4

olegkkruglov force-pushed theinc-support branch from5986180 to535c1e4Compare

September 23, 2024 18:00

Copy link

Contributor

md-shafiul-alam commentedSep 23, 2024

/azp run CI

Copy link

azure-pipelinesbot commentedSep 23, 2024

Azure Pipelines failed to run 1 pipeline(s).

Copy link

Contributor

md-shafiul-alam commentedSep 23, 2024

/azp run ml-benchmarks

Copy link

azure-pipelinesbot commentedSep 23, 2024

No pipelines are associated with this pull request.

Fix win yml

d6952ac

Copy link

Contributor

md-shafiul-alam commentedSep 23, 2024

/azp run

Copy link

azure-pipelinesbot commentedSep 23, 2024

Azure Pipelines successfully started running 1 pipeline(s).

olegkkruglov added2 commits

September 24, 2024 02:46

Remove samples/ms info

03a152a

Remove BS from config (need to add after pip version update)

3ac5c23

Alexsandruss requested changes

Sep 24, 2024

View reviewed changes

sklbench/benchmarks/sklearn_estimator.py OutdatedShow resolvedHide resolved

Add condition for finalize

9461fad

olegkkruglov requested a review fromAlexsandruss

September 25, 2024 09:01

Alexsandruss requested changes

Sep 25, 2024

View reviewed changes

sklbench/report/implementation.py

		bench_cases=pd.DataFrame(
		[flatten_dict(bench_case)forbench_caseinresults["bench_cases"]]
		)
		bench_cases=bench_cases.map(lambdax:str(x)ifnotisinstance(x,Hashable)elsex)

Copy link

Contributor

AlexsandrussSep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is non-hashable object you are trying to convert?

Copy link

Author

olegkkruglovSep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

basic statistics result_options parameter is a list

sklbench/benchmarks/sklearn_estimator.py OutdatedShow resolvedHide resolved

configs/incremental.json OutdatedShow resolvedHide resolved

Fix num_batches usage

b82d772

Alexsandruss requested changes

Sep 26, 2024

View reviewed changes

configs/incremental.jsonShow resolvedHide resolved

configs/incremental.json OutdatedShow resolvedHide resolved

sklbench/benchmarks/sklearn_estimator.py Outdated

Comment on lines 337 to 344

		defcreate_online_function(
		estimator_instance,method_instance,data_args,num_batches,batch_size
		):

		if"y"inlist(inspect.signature(method_instance).parameters):

		defndarray_function(x,y):
		foriinrange(n_batches):
		foriinrange(num_batches):

Copy link

Contributor

AlexsandrussSep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Leave old simple logic withbatch_size only.

Copy link

Author

olegkkruglovSep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why?

Copy link

Contributor

AlexsandrussOct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size beforepartial_fit call in real world cases.

Copy link

Author

olegkkruglovOct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why change?

Adding new feature which can be useful.

It overcomplicates data slicing with extra parameter checks and calculations

It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.

it is more common to know batch size beforepartial_fit call in real world cases.

But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.

Copy link

Contributor

ethanglaserJan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it makes sense to have both, since one would depend on the other

Copy link

Contributor

ethanglaserJan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Upon further investigation and using this branch, I think the setup here makes sense

Copy link

Contributor

ethanglaserJan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It just allows user to specify either num_batches or batch_size in the config. Original usage is unimpacted. I have no objections to this

sklbench/benchmarks/sklearn_estimator.py Outdated

Comment on lines 427 to 429

		ifmethod=="partial_fit":
		num_batches=get_bench_case_value(bench_case,"data:num_batches")
		batch_size=get_bench_case_value(bench_case,"data:batch_size")

Copy link

Contributor

AlexsandrussSep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of separate branch forpartial_fit, extend mechanism ofonline_inference_mode to partial fitting too.

Copy link

Author

olegkkruglovSep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.

Copy link

Contributor

AlexsandrussOct 2, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually,online_inference_mode was removed as unnecessary before merge of refactor branch. This mode is enabled bybatch_size != None only.

Copy link

Contributor

AlexsandrussOct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can split batch size into two for training and inference.

Copy link

Author

olegkkruglovOct 4, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually,online_inference_mode was removed as unnecessary before merge of refactor branch.

what should I extend then?

olegkkruglov added2 commits

September 27, 2024 02:29

Reduce config

b5ad233

Add covariance module to incremental config

fc4ad2b

Alexsandruss reviewed

Oct 2, 2024

View reviewed changes

configs/incremental.json Outdated

Comment on lines 43 to 44

		"library":"sklearnex",
		"num_batches": {"training":2}

Copy link

Contributor

AlexsandrussOct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	"library":"sklearnex",
	"num_batches": {"training":2}
	"library":"sklearnex"

configs/incremental.json

Comment on lines 52 to 53

		"library":"sklearnex",
		"num_batches": {"training":2}

Copy link

Contributor

AlexsandrussOct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	"library":"sklearnex",
	"num_batches": {"training":2}
	"library":"sklearnex"

configs/incremental.json

Comment on lines 61 to 62

		"library":"sklearnex.preview",
		"num_batches": {"training":2}

Copy link

Contributor

AlexsandrussOct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	"library":"sklearnex.preview",
	"num_batches": {"training":2}
	"library":"sklearnex.preview"

configs/incremental.jsonShow resolvedHide resolved

olegkkruglov added4 commits

October 4, 2024 02:49

Rename example config

040802d

Remove bs mentioning in config (need to be added later)

69cc4c1

Fix num_batches and batch_size reading from config

f275062

Revert accidentally pushed changes

5a9be80

ethanglaser reviewed

Jan 7, 2025

View reviewed changes

sklbench/benchmarks/sklearn_estimator.py

Comment on lines +372 to +373

		ifhasattr(estimator_instance,"_onedal_finalize_fit"):
		estimator_instance._onedal_finalize_fit()

Copy link

Contributor

ethanglaserJan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is it necessary to call finalize_fit? wouldn't this happen automatically? we specifically have flexibly logic here (ie use of method_instance variable) so let's avoid specific calls if possible

Copy link

Author

olegkkruglovJan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

finalize_fit is called if any result attribute has been called. only partial_fit would be measured here without this call

sklbench/benchmarks/sklearn_estimator.py Outdated

Comment on lines 337 to 344

		defcreate_online_function(
		estimator_instance,method_instance,data_args,num_batches,batch_size
		):

		if"y"inlist(inspect.signature(method_instance).parameters):

		defndarray_function(x,y):
		foriinrange(n_batches):
		foriinrange(num_batches):

Copy link

Contributor

ethanglaserJan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it makes sense to have both, since one would depend on the other

ethanglaser reviewed

Jan 16, 2025

View reviewed changes

configs/sklearnex_incremental_example.json

		{
		"estimator":"IncrementalEmpiricalCovariance",
		"library":"sklearnex.covariance",
		"estimator_methods": {"training":"partial_fit"},

Copy link

Contributor

ethanglaserJan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is there a reason estimator_methods is only specified for one algo?

ethanglaser added2 commits

March 6, 2025 16:57

Merge branch 'main' into inc-support

66d977d

remove batch_size logic from incremental benchmarking for num_batches

1d48f3a

ethanglaser requested a review fromAlexsandruss

March 18, 2025 05:32

Alexsandruss requested changes

Mar 21, 2025

View reviewed changes

configs/README.md

		\|:---------------\|:--------------\|:--------\|:------------\|
		\|`algorithm`:`estimator`\| None\|\| Name of measured estimator.\|
		\|`algorithm`:`estimator_params`\| Empty`dict`\|\| Parameters for estimator constructor.\|
		\|`algorithm`:`training`:`num_batches`\| 5\|\| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.\|

Copy link

Contributor

AlexsandrussMar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why same is not applied to inference?
Wrong order of keys:

Suggested change

	\|`algorithm`:`training`:`num_batches`\| 5\|\| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.\|
	\|`algorithm`:`num_batches`:`training`\| 5\|\| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.\|

sklbench/benchmarks/sklearn_estimator.py

		# default estimator methods
		estimator_methods= {
		"training": ["fit"],
		"training": ["partial_fit","fit"],

Copy link

Contributor

AlexsandrussMar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinkpartial_fit should be explicitly requested in config since most of incremental estimators can work in both modes:

Suggested change

	"training": ["partial_fit","fit"],
	"training": ["fit"],

sklbench/benchmarks/sklearn_estimator.py

		batch_size=get_bench_case_value(
		bench_case,f"algorithm:batch_size:{stage}"
		)
		ifbatch_sizeisnotNone:

Copy link

Contributor

AlexsandrussMar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Batch size setting is required by inference measurements.

Labels

None yet

		\|:---------------\|:--------------\|:--------\|:------------\|
		\|`algorithm`:`estimator`\| None\|\| Name of measured estimator.\|
		\|`algorithm`:`estimator_params`\| Empty`dict`\|\| Parameters for estimator constructor.\|
		\|`algorithm`:`training`:`num_batches`\| 5\|\| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.\|

Movatterモバイル変換

ENH: Add incremental algorithms support#160

Are you sure you want to change the base?

ENH: Add incremental algorithms support#160

Uh oh!

Conversation

olegkkruglov commentedSep 20, 2024• edited by ethanglaserLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Description

Uh oh!

md-shafiul-alam commentedSep 23, 2024

Uh oh!

azure-pipelinesbot commentedSep 23, 2024

Uh oh!

md-shafiul-alam commentedSep 23, 2024

Uh oh!

azure-pipelinesbot commentedSep 23, 2024

Uh oh!

md-shafiul-alam commentedSep 23, 2024

Uh oh!

azure-pipelinesbot commentedSep 23, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexsandrussOct 2, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olegkkruglovOct 4, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

olegkkruglov commentedSep 20, 2024•
edited by ethanglaser
Loading

AlexsandrussOct 2, 2024•
edited
Loading

olegkkruglovOct 4, 2024•
edited
Loading