- Notifications
You must be signed in to change notification settings - Fork74
ENH: Add incremental algorithms support#160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
md-shafiul-alam commentedSep 23, 2024
/azp run CI |
| Azure Pipelines failed to run 1 pipeline(s). |
md-shafiul-alam commentedSep 23, 2024
/azp run ml-benchmarks |
| No pipelines are associated with this pull request. |
md-shafiul-alam commentedSep 23, 2024
/azp run |
| Azure Pipelines successfully started running 1 pipeline(s). |
Uh oh!
There was an error while loading.Please reload this page.
| bench_cases=pd.DataFrame( | ||
| [flatten_dict(bench_case)forbench_caseinresults["bench_cases"]] | ||
| ) | ||
| bench_cases=bench_cases.map(lambdax:str(x)ifnotisinstance(x,Hashable)elsex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is non-hashable object you are trying to convert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
basic statistics result_options parameter is a list
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| defcreate_online_function( | ||
| estimator_instance,method_instance,data_args,num_batches,batch_size | ||
| ): | ||
| if"y"inlist(inspect.signature(method_instance).parameters): | ||
| defndarray_function(x,y): | ||
| foriinrange(n_batches): | ||
| foriinrange(num_batches): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Leave old simple logic withbatch_size only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size beforepartial_fit call in real world cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why change?
Adding new feature which can be useful.
It overcomplicates data slicing with extra parameter checks and calculations
It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.
it is more common to know batch size before
partial_fitcall in real world cases.
But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think it makes sense to have both, since one would depend on the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Upon further investigation and using this branch, I think the setup here makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It just allows user to specify either num_batches or batch_size in the config. Original usage is unimpacted. I have no objections to this
| ifmethod=="partial_fit": | ||
| num_batches=get_bench_case_value(bench_case,"data:num_batches") | ||
| batch_size=get_bench_case_value(bench_case,"data:batch_size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Instead of separate branch forpartial_fit, extend mechanism ofonline_inference_mode to partial fitting too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.
AlexsandrussOct 2, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually,online_inference_mode was removed as unnecessary before merge of refactor branch. This mode is enabled bybatch_size != None only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You can split batch size into two for training and inference.
olegkkruglovOct 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually,
online_inference_modewas removed as unnecessary before merge of refactor branch.
what should I extend then?
configs/incremental.json Outdated
| "library":"sklearnex", | ||
| "num_batches": {"training":2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| "library":"sklearnex", | |
| "num_batches": {"training":2} | |
| "library":"sklearnex" |
| "library":"sklearnex", | ||
| "num_batches": {"training":2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| "library":"sklearnex", | |
| "num_batches": {"training":2} | |
| "library":"sklearnex" |
| "library":"sklearnex.preview", | ||
| "num_batches": {"training":2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| "library":"sklearnex.preview", | |
| "num_batches": {"training":2} | |
| "library":"sklearnex.preview" |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| ifhasattr(estimator_instance,"_onedal_finalize_fit"): | ||
| estimator_instance._onedal_finalize_fit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
is it necessary to call finalize_fit? wouldn't this happen automatically? we specifically have flexibly logic here (ie use of method_instance variable) so let's avoid specific calls if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
finalize_fit is called if any result attribute has been called. only partial_fit would be measured here without this call
| defcreate_online_function( | ||
| estimator_instance,method_instance,data_args,num_batches,batch_size | ||
| ): | ||
| if"y"inlist(inspect.signature(method_instance).parameters): | ||
| defndarray_function(x,y): | ||
| foriinrange(n_batches): | ||
| foriinrange(num_batches): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think it makes sense to have both, since one would depend on the other
| { | ||
| "estimator":"IncrementalEmpiricalCovariance", | ||
| "library":"sklearnex.covariance", | ||
| "estimator_methods": {"training":"partial_fit"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
is there a reason estimator_methods is only specified for one algo?
| |:---------------|:--------------|:--------|:------------| | ||
| |`algorithm`:`estimator`| None|| Name of measured estimator.| | ||
| |`algorithm`:`estimator_params`| Empty`dict`|| Parameters for estimator constructor.| | ||
| |`algorithm`:`training`:`num_batches`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
- Why same is not applied to inference?
- Wrong order of keys:
| |`algorithm`:`training`:`num_batches`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.| | |
| |`algorithm`:`num_batches`:`training`| 5|| Number of batches to benchmark`partial_fit` function, using batches the size of number of samples specified (not samples divided by`num_batches`). For incremental estimators only.| |
| # default estimator methods | ||
| estimator_methods= { | ||
| "training": ["fit"], | ||
| "training": ["partial_fit","fit"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I thinkpartial_fit should be explicitly requested in config since most of incremental estimators can work in both modes:
| "training": ["partial_fit","fit"], | |
| "training": ["fit"], |
| batch_size=get_bench_case_value( | ||
| bench_case,f"algorithm:batch_size:{stage}" | ||
| ) | ||
| ifbatch_sizeisnotNone: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Batch size setting is required by inference measurements.
Uh oh!
There was an error while loading.Please reload this page.
Description
num_batchesparameter that determines number of calls of partial fit on dataset