Movatterモバイル変換


[0]ホーム

URL:


ContentsMenuExpandLight modeDark modeAuto light/dark, in light modeAuto light/dark, in dark modeSkip to content
onnx-array-api 0.3.4 documentation
Logo
onnx-array-api 0.3.4 documentation

Contents

More

Back to top

Note

Go to the endto download the full example code.

Profiling with onnxruntime

onnxruntime optimizes the onnx graph by default before runningthe inference. It modifies, fuses or add new operators.Some of them are standard onnx operators, some of themare implemented in onnxruntime (seeSupported Operators).This example profiles the two models.

Optimize a model with onnxruntime

importosimportnumpyimportmatplotlib.pyplotaspltfromonnxruntimeimportget_available_providersfromonnx_array_api.ext_test_caseimportexample_pathfromonnx_array_api.ort.ort_optimizersimportort_optimized_modelfromonnx_array_api.ort.ort_profileimportort_profile,merge_ort_profilefromonnx_array_api.plotting.stat_plotimportplot_ort_profilesuffix=""filename=example_path(f"data/small{suffix}.onnx")optimized=filename+".optimized.onnx"print(f"model={filename!r}")ifnotos.path.exists(optimized):ort_optimized_model(filename,output=optimized)print(f"optimized={optimized!r}")
model='data/small.onnx'optimized='data/small.onnx.optimized.onnx'

Profiling

feeds={"input":numpy.random.random((1,3,112,112)).astype(numpy.float32)}prof_base=ort_profile(filename,feeds,repeat=6,disable_optimization=True,providers=["CPUExecutionProvider"],)prof_base.to_excel(f"prof_base{suffix}.xlsx",index=False)prof_base
catpidtiddurtsphnameargs_thread_scheduling_statsargs_output_type_shapeargs_output_sizeargs_parameter_sizeargs_activation_sizeargs_node_indexargs_input_type_shapeargs_providerargs_op_nameop_nameevent_nameiteration
0Session12004612004695510Xmodel_loading_uriNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_loading_uri-1
1Session1200461200468761020Xsession_initializationNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNsession_initialization-1
2Node12004612004610582106Xn0_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126471681505280[{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProviderConvn0kernel_time-1
3Node12004612004610483192Xn1_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126425632112641[{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProviderPRelun1kernel_time-1
4Node1200461200464284263Xn3_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 1, 112, 112]}]50176032112643[{'float': [1, 64, 112, 112]}]CPUExecutionProviderReduceMaxn3kernel_time-1
............................................................
93Node12004612004612274962Xn10_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]3211264256321126410[{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProviderPRelun10kernel_time4
94Node120046120046144575093Xn11_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]802816147712321126411[{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProviderConvn11kernel_time4
95Node1200461200464476547Xn13_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]8028160160563213[{'float': [1, 64, 56, 56]}, {'float': [1, 64,...CPUExecutionProviderAddn13kernel_time4
96Session120046120046850168097XSequentialExecutor::ExecuteNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNSequentialExecutor::Execute5
97Session120046120046852968080Xmodel_runNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_run5

98 rows × 19 columns



And the optimized model.

prof_opti=ort_profile(optimized,feeds,repeat=6,disable_optimization=True,providers=["CPUExecutionProvider"],)prof_opti.to_excel(f"prof_opti{suffix}.xlsx",index=False)prof_opti
catpidtiddurtsphnameargs_thread_scheduling_statsargs_output_type_shapeargs_output_sizeargs_parameter_sizeargs_activation_sizeargs_node_indexargs_input_type_shapeargs_providerargs_op_nameop_nameevent_nameiteration
0Session1200461200466903Xmodel_loading_uriNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_loading_uri-1
1Session120046120046643726Xsession_initializationNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNsession_initialization-1
2Node1200461200464131568Xr0_nchwc_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126471681505280[{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProviderConvr0_nchwckernel_time-1
3Node1200461200462342002XReorderOutput_token_14_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]3211264032112641[{'float': [1, 64, 112, 112]}]CPUExecutionProviderReorderOutputReorderOutput_token_14kernel_time-1
4Node1200461200461512252Xn1_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126425632112642[{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProviderPRelun1kernel_time-1
............................................................
123Node1200461200467645040XReorderInput_token_12_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]32112640321126414[{'float': [1, 64, 112, 112]}]CPUExecutionProviderReorderInputReorderInput_token_12kernel_time4
124Node12004612004667345123Xr11_nchwc_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]802816147712401408017[{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProviderConvr11_nchwckernel_time4
125Node1200461200464045805XReorderOutput_kernel_time{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]802816080281618[{'float': [1, 64, 56, 56]}]CPUExecutionProviderReorderOutputReorderOutputkernel_time4
126Session120046120046535940491XSequentialExecutor::ExecuteNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNSequentialExecutor::Execute5
127Session120046120046544540413Xmodel_runNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_run5

128 rows × 19 columns



And the graph is:

unique_op=set(prof_base["args_op_name"])fig,ax=plt.subplots(2,2,figsize=(10,len(unique_op)),sharex="col")plot_ort_profile(prof_base,ax[0,0],ax[0,1],title="baseline")plot_ort_profile(prof_opti,ax[1,0],ax[1,1],title="optimized")fig.tight_layout()fig.savefig(f"plot_profiling{suffix}.png")
baseline, n occurences, optimized, n occurences

Merging profiles

Let’s try to compare both profiles assuming every iterationprocess the same image and the input and output size are thesame at every iteration.

merge,gr=merge_ort_profile(prof_base,prof_opti)merge.to_excel(f"plot_profiling_merged{suffix}.xlsx",index=False)merge
~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7ce342d63880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7ce342d63880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(
args_op_nameargs_output_type_shapeargs_input_type_shapeargs_provideridxdurbasecountbaseduropticountopti
0Add[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 56, 56]}, {'float': [1, 64,...CPUExecutionProvider0603.06.0NaNNaN
1BatchNormalization[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64]}...CPUExecutionProvider03786.06.02795.06.0
2Concat[{'float': [1, 2, 112, 112]}][{'float': [1, 1, 112, 112]}, {'float': [1, 1,...CPUExecutionProvider0512.06.0198.06.0
3Conv[{'float': [1, 1, 112, 112]}][{'float': [1, 2, 112, 112]}, {'float': [1, 2,...CPUExecutionProvider01765.06.0NaNNaN
4Conv[{'float': [1, 64, 112, 112]}][{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProvider04170.06.01436.06.0
5Conv[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider038055.06.020675.06.0
6Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider02732.06.0901.06.0
7Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider0NaNNaN5302.06.0
8Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider011862.06.0NaNNaN
9Conv[{'float': [1, 8, 112, 112]}][{'float': [1, 2, 112, 112]}, {'float': [8, 2,...CPUExecutionProvider0NaNNaN1097.06.0
10Mul[{'float': [1, 64, 112, 112]}][{'float': [1, 1, 112, 112]}, {'float': [1, 64...CPUExecutionProvider0788.06.0608.06.0
11PRelu[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider02733.06.0805.06.0
12PRelu[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider11165.06.0757.06.0
13ReduceMax[{'float': [1, 1, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider01895.06.01258.06.0
14ReduceMean[{'float': [1, 1, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider01906.06.01514.06.0
15ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider0NaNNaN844.06.0
16ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider1NaNNaN784.06.0
17ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider2NaNNaN579.06.0
18ReorderOutput[{'float': [1, 1, 112, 112]}][{'float': [1, 8, 112, 112]}]CPUExecutionProvider0NaNNaN309.06.0
19ReorderOutput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider0NaNNaN1247.06.0
20ReorderOutput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider1NaNNaN810.06.0
21ReorderOutput[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 56, 56]}]CPUExecutionProvider0NaNNaN304.06.0
22Sigmoid[{'float': [1, 1, 112, 112]}][{'float': [1, 1, 112, 112]}]CPUExecutionProvider0160.06.0NaNNaN


More detailed

gr.to_excel(f"plot_profiling_merged_details{suffix}.xlsx",index=False)gr
durbaseduropticountbasecountopti
label
[+CPU]Conv(f-1x2x112x112,f-8x2x7x7)->f-1x8x112x1120.01097.00.06.0
[+CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64,f-1x64x56x56)->f-1x64x56x560.05302.00.06.0
[+CPU]ReorderInput(f-1x64x112x112)->f-1x64x112x1120.02207.00.018.0
[+CPU]ReorderOutput(f-1x64x112x112)->f-1x64x112x1120.02057.00.012.0
[+CPU]ReorderOutput(f-1x64x56x56)->f-1x64x56x560.0304.00.06.0
[+CPU]ReorderOutput(f-1x8x112x112)->f-1x1x112x1120.0309.00.06.0
[-CPU]Add(f-1x64x56x56,f-1x64x56x56)->f-1x64x56x56603.00.06.00.0
[-CPU]Conv(f-1x2x112x112,f-1x2x7x7)->f-1x1x112x1121765.00.06.00.0
[-CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64)->f-1x64x56x5611862.00.06.00.0
[-CPU]Sigmoid(f-1x1x112x112)->f-1x1x112x112160.00.06.00.0
[=CPU]BatchNormalization(f-1x64x112x112,f-64,f-64,f-64,f-64)->f-1x64x112x1123786.02795.06.06.0
[=CPU]Concat(f-1x1x112x112,f-1x1x112x112)->f-1x2x112x112512.0198.06.06.0
[=CPU]Conv(f-1x3x112x112,f-64x3x3x3,f-64)->f-1x64x112x1124170.01436.06.06.0
[=CPU]Conv(f-1x64x112x112,f-64x64x1x1,f-64)->f-1x64x56x562732.0901.06.06.0
[=CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64)->f-1x64x112x11238055.020675.06.06.0
[=CPU]Mul(f-1x1x112x112,f-1x64x112x112)->f-1x64x112x112788.0608.06.06.0
[=CPU]PRelu(f-1x64x112x112,f-64x1x1)->f-1x64x112x1123898.01562.012.012.0
[=CPU]ReduceMax(f-1x64x112x112)->f-1x1x112x1121895.01258.06.06.0
[=CPU]ReduceMean(f-1x64x112x112)->f-1x1x112x1121906.01514.06.06.0


Final plot

# let's filter out unsignificant operator.grmax=gr["durbase"]+gr["duropti"]total=grmax.sum()grmax/=totalgr=gr[grmax>=0.01]fig,ax=plt.subplots(1,2,figsize=(14,min(gr.shape[0],500)),sharey=True)gr[["durbase","duropti"]].plot.barh(ax=ax[0])ax[0].set_title("Side by side duration")gr=gr.copy()gr[["countbase","countopti"]].plot.barh(ax=ax[1])ax[1].set_title("Side by side count")fig.tight_layout()fig.savefig(f"plot_profiling_side_by_side{suffix}.png")
Side by side duration, Side by side count

On CUDA

if"CUDAExecutionProvider"inget_available_providers():print("Profiling on CUDA")prof_base=ort_profile(filename,feeds,repeat=6,disable_optimization=True,providers=["CUDAExecutionProvider"],)prof_base.to_excel(f"prof_cuda_base{suffix}.xlsx",index=False)prof_opti=ort_profile(optimized,feeds,repeat=6,disable_optimization=True,providers=["CUDAExecutionProvider","CPUExecutionProvider"],)prof_opti.to_excel(f"prof_cuda_opti{suffix}.xlsx",index=False)unique_op=set(prof_base["args_op_name"])fig,ax=plt.subplots(2,2,figsize=(10,len(unique_op)),sharex="col")plot_ort_profile(prof_base,ax[0,0],ax[0,1],title="baseline")plot_ort_profile(prof_opti,ax[1,0],ax[1,1],title="optimized")fig.tight_layout()fig.savefig(f"plot_profiling_cuda{suffix}.png")merge,gr=merge_ort_profile(prof_base,prof_opti)merge.to_excel(f"plot_profiling_merged{suffix}.xlsx",index=False)gr.to_excel(f"plot_profiling_merged_details{suffix}.xlsx",index=False)grmax=gr["durbase"]+gr["duropti"]total=grmax.sum()grmax/=totalgr=gr[grmax>=0.01]fig,ax=plt.subplots(1,2,figsize=(14,min(gr.shape[0],500)),sharey=True)gr[["durbase","duropti"]].plot.barh(ax=ax[0])ax[0].set_title("Side by side duration")gr=gr.copy()gr[["countbase","countopti"]].plot.barh(ax=ax[1])ax[1].set_title("Side by side count")fig.tight_layout()fig.savefig(f"plot_profiling_side_by_side_cuda{suffix}.png")else:print(f"CUDA not available in{get_available_providers()}.")fig,ax=None,Noneax
  • baseline, n occurences, optimized, n occurences
  • Side by side duration, Side by side count
Profiling on CUDA~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7ce342d63880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7ce342d63880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(array([<Axes: title={'center': 'Side by side duration'}, ylabel='label'>,       <Axes: title={'center': 'Side by side count'}, ylabel='label'>],      dtype=object)

Total running time of the script: (0 minutes 4.155 seconds)

Gallery generated by Sphinx-Gallery

On this page

[8]ページ先頭

©2009-2025 Movatter.jp