Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Add incremental algorithms support#160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
olegkkruglov wants to merge14 commits intoIntelPython:main
base:main
Choose a base branch
Loading
fromolegkkruglov:inc-support

Conversation

@olegkkruglov
Copy link

@olegkkruglovolegkkruglov commentedSep 20, 2024
edited by ethanglaser
Loading

Description

  • Added support of incremental algorithms
  • Added config example for introduced functionality
  • Fixed bug in report generator which led to fail in case if one of estimator attribute is not hashable
  • Fixed warning in report generator which appeared in geomean calculation in case if Dataframe is empty.
  • Introducenum_batches parameter that determines number of calls of partial fit on dataset

@samir-nasiblisamir-nasibli changed the titleAdd incremental algorithms supportENH: Add incremental algorithms supportSep 20, 2024
@md-shafiul-alam
Copy link
Contributor

/azp run CI

@azure-pipelines
Copy link

Azure Pipelines failed to run 1 pipeline(s).

@md-shafiul-alam
Copy link
Contributor

/azp run ml-benchmarks

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@md-shafiul-alam
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

bench_cases=pd.DataFrame(
[flatten_dict(bench_case)forbench_caseinresults["bench_cases"]]
)
bench_cases=bench_cases.map(lambdax:str(x)ifnotisinstance(x,Hashable)elsex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is non-hashable object you are trying to convert?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

basic statistics result_options parameter is a list

Comment on lines 337 to 344
defcreate_online_function(
estimator_instance,method_instance,data_args,num_batches,batch_size
):

if"y"inlist(inspect.signature(method_instance).parameters):

defndarray_function(x,y):
foriinrange(n_batches):
foriinrange(num_batches):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Leave old simple logic withbatch_size only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size beforepartial_fit call in real world cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why change?

Adding new feature which can be useful.

It overcomplicates data slicing with extra parameter checks and calculations

It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.

it is more common to know batch size beforepartial_fit call in real world cases.

But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it makes sense to have both, since one would depend on the other

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Upon further investigation and using this branch, I think the setup here makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It just allows user to specify either num_batches or batch_size in the config. Original usage is unimpacted. I have no objections to this

Comment on lines 427 to 429
ifmethod=="partial_fit":
num_batches=get_bench_case_value(bench_case,"data:num_batches")
batch_size=get_bench_case_value(bench_case,"data:batch_size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of separate branch forpartial_fit, extend mechanism ofonline_inference_mode to partial fitting too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.

Copy link
Contributor

@AlexsandrussAlexsandrussOct 2, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually,online_inference_mode was removed as unnecessary before merge of refactor branch. This mode is enabled bybatch_size != None only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can split batch size into two for training and inference.

Copy link
Author

@olegkkruglovolegkkruglovOct 4, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually,online_inference_mode was removed as unnecessary before merge of refactor branch.

what should I extend then?

Comment on lines 43 to 44
"library":"sklearnex",
"num_batches": {"training":2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
"library":"sklearnex",
"num_batches": {"training":2}
"library":"sklearnex"

Comment on lines 52 to 53
"library":"sklearnex",
"num_batches": {"training":2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
"library":"sklearnex",
"num_batches": {"training":2}
"library":"sklearnex"

Comment on lines 61 to 62
"library":"sklearnex.preview",
"num_batches": {"training":2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
"library":"sklearnex.preview",
"num_batches": {"training":2}
"library":"sklearnex.preview"

Comment on lines +372 to +373
ifhasattr(estimator_instance,"_onedal_finalize_fit"):
estimator_instance._onedal_finalize_fit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is it necessary to call finalize_fit? wouldn't this happen automatically? we specifically have flexibly logic here (ie use of method_instance variable) so let's avoid specific calls if possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

finalize_fit is called if any result attribute has been called. only partial_fit would be measured here without this call

Comment on lines 337 to 344
defcreate_online_function(
estimator_instance,method_instance,data_args,num_batches,batch_size
):

if"y"inlist(inspect.signature(method_instance).parameters):

defndarray_function(x,y):
foriinrange(n_batches):
foriinrange(num_batches):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it makes sense to have both, since one would depend on the other

{
"estimator":"IncrementalEmpiricalCovariance",
"library":"sklearnex.covariance",
"estimator_methods": {"training":"partial_fit"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is there a reason estimator_methods is only specified for one algo?

|:---------------|:--------------|:--------|:------------|
|`algorithm`:`estimator`| None|| Name of measured estimator.|
|`algorithm`:`estimator_params`| Empty`dict`|| Parameters for estimator constructor.|
|`algorithm`:`training`:`num_batches`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

  1. Why same is not applied to inference?
  2. Wrong order of keys:
Suggested change
|`algorithm`:`training`:`num_batches`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.|
|`algorithm`:`num_batches`:`training`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.|

# default estimator methods
estimator_methods= {
"training": ["fit"],
"training": ["partial_fit","fit"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinkpartial_fit should be explicitly requested in config since most of incremental estimators can work in both modes:

Suggested change
"training": ["partial_fit","fit"],
"training": ["fit"],

batch_size=get_bench_case_value(
bench_case,f"algorithm:batch_size:{stage}"
)
ifbatch_sizeisnotNone:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Batch size setting is required by inference measurements.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ethanglaserethanglaserethanglaser left review comments

@AlexsandrussAlexsandrussAlexsandruss requested changes

@md-shafiul-alammd-shafiul-alamAwaiting requested review from md-shafiul-alam

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@olegkkruglov@md-shafiul-alam@Alexsandruss@ethanglaser

[8]ページ先頭

©2009-2025 Movatter.jp