Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Introduce client-side cache in GrpcStorageProxy#5872

Merged
nabenabe0928 merged 8 commits intooptuna:masterfromc-bata:grpc-proxy-trials-cache
Jan 7, 2025

Conversation

c-bata
Copy link
Member

@c-batac-bata commentedDec 20, 2024
edited
Loading

Motivation

Follow-up#5852
To makeGrpcStorageProxy faster by introducing a client-side cache on GrpcStorageProxy.

Description of the changes

This PR makesGrpcStorageProxy faster by cachestorage.get_all_trials calls. Here is the performance comparison results and the benchmark script I used.

master branchThis PR
87.392s (+/- 0.6052104)48.293s (+/- 0.223478)
# Start a MySQL server$ docker run -d --rm -p 3306:3306 -v ./mysql-conf:/etc/mysql/conf.d -e MYSQL_USER=optuna -e MYSQL_DATABASE=optuna -e MYSQL_PASSWORD=password -e MYSQL_ALLOW_EMPTY_PASSWORD=yes --name optuna-mysql mysql:8.0# Start a gRPC storage proxy server$ python3 -c"import optuna; from concurrent.futures import ThreadPoolExecutor; from optuna.storages import run_grpc_proxy_server; run_grpc_proxy_server(storage=optuna.storages.get_storage('mysql+pymysql://optuna:password@127.0.0.1:3306/optuna'), host='localhost', port=13000, thread_pool=ThreadPoolExecutor(10))"
from __future__importannotationsfromconcurrent.futuresimportas_completedfromconcurrent.futuresimportProcessPoolExecutorimportthreadingimporttimefromtypingimportLiteralimportnumpyasnpimportoptunafromoptuna.storagesimportBaseStoragefromoptuna.storagesimportGrpcStorageProxystudy_name="grpc-example"n_workers=10n_repeat=5n_trials=1000STORAGE_KIND=Literal["grpc","rdb"]storage_kinds:list[STORAGE_KIND]= ["grpc"]defget_storage(storage_kind:STORAGE_KIND)->BaseStorage:ifstorage_kind=="grpc":returnGrpcStorageProxy(host="localhost",port=13000)elifstorage_kind=="rdb":returnoptuna.storages.get_storage("mysql+pymysql://optuna:password@127.0.0.1:3306/optuna")defobjective(trial:optuna.Trial)->float:s=0.0foriinrange(10):s+=trial.suggest_float(f"x{i}",-10,10)**2returnsdefstudy_optimize(storage_kind:STORAGE_KIND,n_trials:int)->int:start=time.time()storage=get_storage(storage_kind)study=optuna.load_study(storage=storage,study_name=study_name,sampler=optuna.samplers.RandomSampler()    )study.optimize(objective,n_trials=n_trials)elapsed=time.time()-startreturnelapseddefbench_study_optimize(storage_kind:STORAGE_KIND)->list[int]:elapsed= []study=optuna.create_study(storage=get_storage("rdb"),study_name=study_name)withProcessPoolExecutor(n_workers)aspool:futures= [pool.submit(study_optimize,storage_kind,n_trials/n_workers)for_inrange(n_workers)        ]forfutinas_completed(futures):elapsed.append(fut.result())returnelapseddefmain():try:optuna.delete_study(storage=get_storage("rdb"),study_name=study_name)exceptException:passelapsed_optimize= {}elapsed_get_all_trials= {}forstorage_kindinstorage_kinds:elapsed_optimize[storage_kind]=bench_study_optimize(storage_kind)optuna.delete_study(storage=get_storage("rdb"),study_name=study_name)forkeyinelapsed_optimize:print(f"{key=}{n_trials=}{n_repeat=}{n_workers=}")elapsed=elapsed_optimize[key]print(f"  study.optimize:{np.mean(elapsed)=}{np.std(elapsed)=}")if__name__=="__main__":main()

@c-batac-bata added the enhancementChange that does not break compatibility and not affect public interfaces, but improves performance. labelDec 20, 2024
HideakiImamura
HideakiImamura previously approved these changesDec 20, 2024
Copy link
Member

@HideakiImamuraHideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the follow-up. Looks very good. i confirmed that all logics are consistent with the cache logic of_CachedStorage and match with the current implementation.

@HideakiImamuraHideakiImamura removed their assignmentDec 20, 2024
@HideakiImamuraHideakiImamura dismissed theirstale reviewDecember 20, 2024 08:14

Sorry for the confusion. It seems that we need to change not onlyget_all_trials but also other methods of storages, such thatdelete_study.

@HideakiImamuraHideakiImamura self-assigned thisDec 20, 2024
@@ -102,6 +105,7 @@ def delete_study(self, study_id: int) -> None:
if e.code() == grpc.StatusCode.NOT_FOUND:
raise KeyError from e
raise
self._cache.delete_study_cache(study_id)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've found an issue regarding the cache invalidation.

With the current schema of Optuna's SQLite3 database, SQLite3 reusesstudy_id, as shown below.

sqlite> .schema studiesCREATETABLEstudies (        study_idINTEGERNOT NULL,         study_nameVARCHAR(512)NOT NULL,PRIMARY KEY (study_id));
>>>storage=optuna.storages.RDBStorage("sqlite:///foo.db")>>>storage.create_new_study(directions=[optuna.study.StudyDirection.MINIMIZE])[I2024-12-2017:10:54,500]AnewstudycreatedinRDBwithname:no-name-2a8665b6-fb78-48d5-937b-cd2c5a71f9ac1>>>storage.delete_study(study_id=1)>>>storage.create_new_study(directions=[optuna.study.StudyDirection.MINIMIZE])[I2024-12-2017:11:04,641]AnewstudycreatedinRDBwithname:no-name-0b71c4ee-123f-4894-9903-4014c6581942

Consequently, when multiple gRPC storage proxy servers are running andstorage.delete_study() is called, the cache may not be invalidated. This issue does not occur with MySQL, as shown below.

>>>importoptuna>>>storage=optuna.storages.get_storage("mysql+pymysql://optuna:password@127.0.0.1:3306/optuna")>>>storage.create_new_study(directions=[optuna.study.StudyDirection.MINIMIZE])[I2024-12-2018:37:24,719]AnewstudycreatedinRDBwithname:no-name-95110374-00cc-4136-a1a3-58d8e5e88cba132>>>storage.delete_study(study_id=132)>>>storage.create_new_study(directions=[optuna.study.StudyDirection.MINIMIZE])[I2024-12-2018:37:30,603]AnewstudycreatedinRDBwithname:no-name-feac51b7-be94-407c-8b83-2e0e8e2d0b9d133

This issue is relatively limited, as it only arises when multiple gRPC storage proxies are running with a single SQLite3 database, and optuna.delete_study() are executed. So, my suggestion is to proceed with this PR and address this issue in the future by addingAUTOINCREMENT (please refer tohttps://www.sqlite.org/autoinc.html for more details).

@c-batac-bataforce-pushed thegrpc-proxy-trials-cache branch from866b980 toe9e24c5CompareDecember 20, 2024 09:52
@c-batac-bata assignedy0z and unassignedHideakiImamuraDec 20, 2024
@c-bata
Copy link
MemberAuthor

@y0z@nabenabe0928 Could you review this PR?

Copy link
Contributor

@nabenabe0928nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I feel like the following methods can also use cache, but do you wanna include them in this PR, make another PR, or do not wanna do it at all?

  • get_study_directions
  • get_trial_id_from_study_id_trial_number
  • get_trial
  • get_study_name_from_id
  • get_study_id_from_name

@@ -53,6 +54,9 @@ class GrpcStorageProxy(BaseStorage):
host: The hostname of the gRPC server.
port: The port of the gRPC server.

.. warning::

Currently, gRPC storage proxy does not work as expected when using a SQLite3 database as the backend and calling :func:`optuna.delete_study`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Did you mean this?
I understood the SQLite3 part, but I am not sure about thedelete_study part.
Could you please confirm?

Suggested change
Currently,gRPCstorageproxydoesnotworkasexpectedwhenusingaSQLite3databaseasthebackendandcalling :func:`optuna.delete_study`.
gRPCstorageproxydoesnotaccept :func:`optuna.delete_study`whenthebackendisSQLite3database.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The gRPC storage proxy does acceptoptuna.delete_study() without raising exceptions. However, it may result in an inconsistent state when using an SQLite3 database as the backend and multiple storage proxies are running.

int64 study_id = 1;
repeated TrialState states = 2;
repeated int64 included_trial_ids = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Probably,including is better if I understand the context of this request.

Suggested change
repeated int64included_trial_ids = 2;
repeated int64including_trial_ids = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the current naming and implementation are consistent with theRDBStorage (ref).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for your review! I'll address this with RDBStorage's_get_trials() method in the following pull request, as this is a private api.

Copy link
Contributor

@nabenabe0928nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I will approve this PR once I confirm the performance.

@y0z
Copy link
Member

y0z commentedDec 24, 2024
edited
Loading

The changes look good to me.

@y0zy0z self-requested a reviewDecember 24, 2024 07:12
Copy link
Member

@y0zy0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@nabenabe0928
Copy link
Contributor

nabenabe0928 commentedDec 24, 2024
edited
Loading

Hmmm, I tried withn_trials=300, but I could only observe degradation...

Plus, it took me so long to finishn_trials=1000 (300+ seconds) in my environment. (I will try n_jobs=10 with the pureRDBStorage later)

I used MySQL as a database and Python 3.11 on Ubuntu 20.04.

$ pip freeze# Outputalembic==1.14.0colorlog==6.9.0greenlet==3.1.1grpcio==1.68.1Mako==1.3.8MarkupSafe==3.0.2mysqlclient==2.2.6numpy==2.2.1packaging==24.2protobuf==5.29.2PyYAML==6.0.2SQLAlchemy==2.0.36tqdm==4.67.1typing_extensions==4.12.2

@nabenabe0928
Copy link
Contributor

nabenabe0928 commentedDec 25, 2024
edited
Loading

I measured the runtime of pure RDBStorage in my env and it took me about a minute for the code below.

Benchmark Code
importosimporttimeimportoptunadefobjective(trial:optuna.Trial)->float:s=0.0foriinrange(10):s+=trial.suggest_float(f"x{i}",-10,10)**2returnsmysql_url=os.environ["OPTUNA_STORAGE"]study=optuna.create_study(sampler=optuna.samplers.RandomSampler(),storage=mysql_url)start=time.time()study.optimize(objective,n_trials=1000,n_jobs=10)print(time.time()-start)

@HideakiImamura
Copy link
Member

I confirmed that this PR is faster than the current master by using the following benchmark code. It looks like theget_all_trials is completely speeded up.

  • Server side code
from __future__importannotationsimportosfromoptuna.storagesimportrun_grpc_proxy_serverimportoptunaimportsignalSTORAGE_URL="YOUR_MYSQL_URL"defhandler(signum,frame):print("Received signal",signum)exit(0)if__name__=="__main__":signal.signal(signal.SIGINT,handler)signal.signal(signal.SIGTERM,handler)run_grpc_proxy_server(optuna.storages.get_storage(STORAGE_URL),host="localhost",port=13000,    )
  • Client side code
from __future__importannotationsimportargparsefromconcurrent.futuresimportProcessPoolExecutor,as_completedfromcollections.abcimportSequencefromcollections.abcimportContainerimportcProfileimporttimeimportnumpyasnpimportoptunaimportthreadingfromtypingimportLiteralfromoptuna.storagesimportGrpcStorageProxy,BaseStorageimportosimportuuidfromtimeimportsleepstudy_name="grpc-example"n_workers=10n_repeat=5n_trials=1000STORAGE_KIND=Literal["grpc","rdb","middleware"]STORAGE_URL="YOUR_MYSQL_URL"defget_storage(storage_kind:STORAGE_KIND)->BaseStorage:ifstorage_kind=="grpc":returnGrpcStorageProxy(host="localhost",port=13000)elifstorage_kind=="rdb":returnoptuna.storages.get_storage(STORAGE_URL)defobjective(trial:optuna.Trial)->float:s=0.0foriinrange(10):trial.set_user_attr(f"attr{i}","attr value")foriinrange(10):s+=trial.suggest_float(f"x{i}",-10,10)**2returnsdefstudy_optimize(storage_kind:STORAGE_KIND,n_trials:int)->int:start=time.time()storage=get_storage(storage_kind)whileTrue:try:study=optuna.load_study(storage=storage,study_name=study_name,sampler=optuna.samplers.RandomSampler())breakexcept:sleep(10)study.optimize(objective,callbacks=[optuna.study.MaxTrialsCallback(n_trials)])elapsed=time.time()-startreturnelapseddefbench_study_optimize(storage_kind:STORAGE_KIND)->list[int]:elapsed= []study=optuna.create_study(storage=get_storage("rdb"),study_name=study_name,load_if_exists=True,sampler=optuna.samplers.RandomSampler(),    )study_optimize(storage_kind,n_trials)returnelapseddefmain(storage:STORAGE_KIND,benchmark_file:str):try:optuna.delete_study(storage=get_storage("rdb"),study_name=study_name)except:passcProfile.runctx(f"bench_study_optimize('{storage}')",globals(),locals(),filename=benchmark_file,    )optuna.delete_study(storage=get_storage(storage),study_name=study_name)if__name__=="__main__":parser=argparse.ArgumentParser()parser.add_argument("--storage",type=str,required=True)parser.add_argument("--benchmark_file",type=str,required=True)args=parser.parse_args()main(args.storage,args.benchmark_file)
  • Results
Current masterThis PR
604 sec203 sec
Screenshot 2024-12-26 at 10 35 09Screenshot 2024-12-26 at 10 35 29

@github-actionsGitHub Actions
Copy link
Contributor

This pull request has not seen any recent activity.

@github-actionsgithub-actionsbot added the staleExempt from stale bot labeling. labelJan 2, 2025
@nabenabe0928nabenabe0928 removed the staleExempt from stale bot labeling. labelJan 3, 2025
@c-bata
Copy link
MemberAuthor

I feel like the following methods can also use cache, but do you wanna include them in this PR, make another PR, or do not wanna do it at all?

  • get_study_directions
  • get_trial_id_from_study_id_trial_number
  • get_trial
  • get_study_name_from_id
  • get_study_id_from_name

Thank you for your review. I think we don't need to use cache for these methods unless they become a bottleneck. What do you think?

nabenabe0928 reacted with thumbs up emoji

@c-batac-bata assignedHideakiImamura and unassignedy0zJan 6, 2025
@c-bata
Copy link
MemberAuthor

Let me unassign@y0z and assign@HideakiImamura instead.

@nabenabe0928
Copy link
Contributor

Ok, I figured out why I did not get the speedup.

I originally used the following code, leading to the use of RDBStorage without cache in local process:

importosfromoptuna.storagesimportrun_grpc_proxy_serverfromoptuna.storagesimportRDBStoragestorage=RDBStorage(os.environ["OPTUNA_STORAGE"])run_grpc_proxy_server(storage,host="localhost",port=13000)

However, when I usedget_storage, which returns_CachedStorage wrappingRDBStorage, I observed the speedup.

importosfromoptuna.storagesimportrun_grpc_proxy_serverfromoptuna.storagesimportget_storagestorage=get_storage(os.environ["OPTUNA_STORAGE"])run_grpc_proxy_server(storage,host="localhost",port=13000)

Master: 112 seconds, This PR: 61 seconds.

Probably, the doc-string should be changedhere.

@nabenabe0928
Copy link
Contributor

Followups:

  • Replace doc-stringhere
  • Replacetry_import with_LazyImport

Copy link
Member

@HideakiImamuraHideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

@HideakiImamuraHideakiImamura removed their assignmentJan 7, 2025
Copy link
Contributor

@nabenabe0928nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for the update, LGTM!

@nabenabe0928nabenabe0928 merged commit0a649e3 intooptuna:masterJan 7, 2025
14 checks passed
@nabenabe0928nabenabe0928 added this to thev4.2.0 milestoneJan 7, 2025
@nabenabe0928nabenabe0928 removed their assignmentJan 7, 2025
nabenabe0928 added a commit that referenced this pull requestJan 7, 2025
Follow-up#5872: Update the docstring of `run_grpc_proxy_server`
@c-batac-bata deleted the grpc-proxy-trials-cache branchJanuary 7, 2025 08:18
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@y0zy0zy0z approved these changes

@HideakiImamuraHideakiImamuraHideakiImamura approved these changes

@nabenabe0928nabenabe0928nabenabe0928 approved these changes

Assignees
No one assigned
Labels
enhancementChange that does not break compatibility and not affect public interfaces, but improves performance.
Projects
None yet
Milestone
v4.2.0
Development

Successfully merging this pull request may close these issues.

4 participants
@c-bata@y0z@nabenabe0928@HideakiImamura

[8]ページ先頭

©2009-2025 Movatter.jp