Movatterモバイル変換


[0]ホーム

URL:


ContentsMenuExpandLight modeDark modeAuto light/dark, in light modeAuto light/dark, in dark modeSkip to content
onnx-array-api 0.3.1 documentation
Logo
onnx-array-api 0.3.1 documentation

Contents

More

Back to top

Note

Go to the endto download the full example code.

Profiling with onnxruntime

onnxruntime optimizes the onnx graph by default before runningthe inference. It modifies, fuses or add new operators.Some of them are standard onnx operators, some of themare implemented in onnxruntime (seeSupported Operators).This example profiles the two models.

Optimize a model with onnxruntime

importosimportnumpyimportmatplotlib.pyplotaspltfromonnxruntimeimportget_available_providersfromonnx_array_api.ext_test_caseimportexample_pathfromonnx_array_api.ort.ort_optimizersimportort_optimized_modelfromonnx_array_api.ort.ort_profileimportort_profile,merge_ort_profilefromonnx_array_api.plotting.stat_plotimportplot_ort_profilesuffix=""filename=example_path(f"data/small{suffix}.onnx")optimized=filename+".optimized.onnx"print(f"model={filename!r}")ifnotos.path.exists(optimized):ort_optimized_model(filename,output=optimized)print(f"optimized={optimized!r}")
model='data/small.onnx'optimized='data/small.onnx.optimized.onnx'

Profiling

feeds={"input":numpy.random.random((1,3,112,112)).astype(numpy.float32)}prof_base=ort_profile(filename,feeds,repeat=6,disable_optimization=True,providers=["CPUExecutionProvider"],)prof_base.to_excel(f"prof_base{suffix}.xlsx",index=False)prof_base
catpidtiddurtsphnameargs_op_nameop_nameargs_thread_scheduling_statsargs_output_type_shapeargs_output_sizeargs_parameter_sizeargs_activation_sizeargs_node_indexargs_input_type_shapeargs_providerevent_nameiteration
0Session85022850224833Xmodel_loading_uriNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_loading_uri-1
1Session8502285022466518Xsession_initializationNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNsession_initialization-1
2Node850228502201139Xn0_fence_beforeConvn0NaNNaNNaNNaNNaNNaNNaNNaNfence_before-1
3Node85022850229651141Xn0_kernel_timeConvn0{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126471681505280[{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProviderkernel_time-1
4Node850228502202114Xn0_fence_afterConvn0NaNNaNNaNNaNNaNNaNNaNNaNfence_after-1
............................................................
261Node85022850220238872Xn13_fence_beforeAddn13NaNNaNNaNNaNNaNNaNNaNNaNfence_before4
262Node850228502270238873Xn13_kernel_timeAddn13{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]8028160160563213[{'float': [1, 64, 56, 56]}, {'float': [1, 64,...CPUExecutionProviderkernel_time4
263Node85022850220238949Xn13_fence_afterAddn13NaNNaNNaNNaNNaNNaNNaNNaNfence_after4
264Session850228502212851226101XSequentialExecutor::ExecuteNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNSequentialExecutor::Execute5
265Session850228502212876226084Xmodel_runNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_run5

266 rows × 19 columns



And the optimized model.

prof_opti=ort_profile(optimized,feeds,repeat=6,disable_optimization=True,providers=["CPUExecutionProvider"],)prof_opti.to_excel(f"prof_opti{suffix}.xlsx",index=False)prof_opti
catpidtiddurtsphnameargs_op_nameop_nameargs_thread_scheduling_statsargs_output_type_shapeargs_output_sizeargs_parameter_sizeargs_activation_sizeargs_node_indexargs_input_type_shapeargs_providerevent_nameiteration
0Session85022850224972Xmodel_loading_uriNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_loading_uri-1
1Session8502285022378525Xsession_initializationNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNsession_initialization-1
2Node850228502201023Xr0_nchwc_fence_beforeConvr0_nchwcNaNNaNNaNNaNNaNNaNNaNNaNfence_before-1
3Node85022850226721026Xr0_nchwc_kernel_timeConvr0_nchwc{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 112, 112]}]321126471681505280[{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProviderkernel_time-1
4Node850228502201704Xr0_nchwc_fence_afterConvr0_nchwcNaNNaNNaNNaNNaNNaNNaNNaNfence_after-1
............................................................
351Node85022850220249744XReorderOutput_token_16_fence_beforeReorderOutputReorderOutput_token_16NaNNaNNaNNaNNaNNaNNaNNaNfence_before4
352Node850228502248249745XReorderOutput_token_16_kernel_timeReorderOutputReorderOutput_token_16{'main_thread': {'thread_pool_name': 'session-...[{'float': [1, 64, 56, 56]}]802816080281618[{'float': [1, 64, 56, 56]}]CPUExecutionProviderkernel_time4
353Node85022850220249796XReorderOutput_token_16_fence_afterReorderOutputReorderOutput_token_16NaNNaNNaNNaNNaNNaNNaNNaNfence_after4
354Session850228502223834225966XSequentialExecutor::ExecuteNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNSequentialExecutor::Execute5
355Session850228502223860225951Xmodel_runNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNmodel_run5

356 rows × 19 columns



And the graph is:

unique_op=set(prof_base["args_op_name"])fig,ax=plt.subplots(2,2,figsize=(10,len(unique_op)),sharex="col")plot_ort_profile(prof_base,ax[0,0],ax[0,1],title="baseline")plot_ort_profile(prof_opti,ax[1,0],ax[1,1],title="optimized")fig.tight_layout()fig.savefig(f"plot_profiling{suffix}.png")
baseline, n occurences, optimized, n occurences

Merging profiles

Let’s try to compare both profiles assuming every iterationprocess the same image and the input and output size are thesame at every iteration.

merge,gr=merge_ort_profile(prof_base,prof_opti)merge.to_excel(f"plot_profiling_merged{suffix}.xlsx",index=False)merge
~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7fc0d2399d80> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7fc0d2399d80> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(
args_op_nameargs_output_type_shapeargs_input_type_shapeargs_provideridxdurbasecountbaseduropticountopti
0Add[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 56, 56]}, {'float': [1, 64,...CPUExecutionProvider02420.06.0NaNNaN
1BatchNormalization[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64]}...CPUExecutionProvider02548.06.08064.06.0
2Concat[{'float': [1, 2, 112, 112]}][{'float': [1, 1, 112, 112]}, {'float': [1, 1,...CPUExecutionProvider0166.06.0127.06.0
3Conv[{'float': [1, 1, 112, 112]}][{'float': [1, 2, 112, 112]}, {'float': [1, 2,...CPUExecutionProvider02497.06.0NaNNaN
4Conv[{'float': [1, 64, 112, 112]}][{'float': [1, 3, 112, 112]}, {'float': [64, 3...CPUExecutionProvider05364.06.04311.06.0
5Conv[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider0101656.06.0102240.06.0
6Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider09524.06.01767.06.0
7Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider0NaNNaN53992.06.0
8Conv[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider059088.06.0NaNNaN
9Conv[{'float': [1, 8, 112, 112]}][{'float': [1, 2, 112, 112]}, {'float': [8, 2,...CPUExecutionProvider0NaNNaN19992.06.0
10Mul[{'float': [1, 64, 112, 112]}][{'float': [1, 1, 112, 112]}, {'float': [1, 64...CPUExecutionProvider02140.06.05791.06.0
11PRelu[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider012364.06.01828.06.0
12PRelu[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}, {'float': [64, ...CPUExecutionProvider12347.06.01783.06.0
13ReduceMax[{'float': [1, 1, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider013900.06.023211.06.0
14ReduceMean[{'float': [1, 1, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider020925.06.05553.06.0
15ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider0NaNNaN6279.06.0
16ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider1NaNNaN1533.06.0
17ReorderInput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider2NaNNaN1280.06.0
18ReorderOutput[{'float': [1, 1, 112, 112]}][{'float': [1, 8, 112, 112]}]CPUExecutionProvider0NaNNaN146.06.0
19ReorderOutput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider0NaNNaN6803.06.0
20ReorderOutput[{'float': [1, 64, 112, 112]}][{'float': [1, 64, 112, 112]}]CPUExecutionProvider1NaNNaN1323.06.0
21ReorderOutput[{'float': [1, 64, 56, 56]}][{'float': [1, 64, 56, 56]}]CPUExecutionProvider0NaNNaN931.06.0
22Sigmoid[{'float': [1, 1, 112, 112]}][{'float': [1, 1, 112, 112]}]CPUExecutionProvider0310.06.0NaNNaN


More detailed

gr.to_excel(f"plot_profiling_merged_details{suffix}.xlsx",index=False)gr
durbaseduropticountbasecountopti
label
[+CPU]Conv(f-1x2x112x112,f-8x2x7x7)->f-1x8x112x1120.019992.00.06.0
[+CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64,f-1x64x56x56)->f-1x64x56x560.053992.00.06.0
[+CPU]ReorderInput(f-1x64x112x112)->f-1x64x112x1120.09092.00.018.0
[+CPU]ReorderOutput(f-1x64x112x112)->f-1x64x112x1120.08126.00.012.0
[+CPU]ReorderOutput(f-1x64x56x56)->f-1x64x56x560.0931.00.06.0
[+CPU]ReorderOutput(f-1x8x112x112)->f-1x1x112x1120.0146.00.06.0
[-CPU]Add(f-1x64x56x56,f-1x64x56x56)->f-1x64x56x562420.00.06.00.0
[-CPU]Conv(f-1x2x112x112,f-1x2x7x7)->f-1x1x112x1122497.00.06.00.0
[-CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64)->f-1x64x56x5659088.00.06.00.0
[-CPU]Sigmoid(f-1x1x112x112)->f-1x1x112x112310.00.06.00.0
[=CPU]BatchNormalization(f-1x64x112x112,f-64,f-64,f-64,f-64)->f-1x64x112x1122548.08064.06.06.0
[=CPU]Concat(f-1x1x112x112,f-1x1x112x112)->f-1x2x112x112166.0127.06.06.0
[=CPU]Conv(f-1x3x112x112,f-64x3x3x3,f-64)->f-1x64x112x1125364.04311.06.06.0
[=CPU]Conv(f-1x64x112x112,f-64x64x1x1,f-64)->f-1x64x56x569524.01767.06.06.0
[=CPU]Conv(f-1x64x112x112,f-64x64x3x3,f-64)->f-1x64x112x112101656.0102240.06.06.0
[=CPU]Mul(f-1x1x112x112,f-1x64x112x112)->f-1x64x112x1122140.05791.06.06.0
[=CPU]PRelu(f-1x64x112x112,f-64x1x1)->f-1x64x112x11214711.03611.012.012.0
[=CPU]ReduceMax(f-1x64x112x112)->f-1x1x112x11213900.023211.06.06.0
[=CPU]ReduceMean(f-1x64x112x112)->f-1x1x112x11220925.05553.06.06.0


Final plot

# let's filter out unsignificant operator.grmax=gr["durbase"]+gr["duropti"]total=grmax.sum()grmax/=totalgr=gr[grmax>=0.01]fig,ax=plt.subplots(1,2,figsize=(14,min(gr.shape[0],500)),sharey=True)gr[["durbase","duropti"]].plot.barh(ax=ax[0])ax[0].set_title("Side by side duration")gr=gr.copy()gr[["countbase","countopti"]].plot.barh(ax=ax[1])ax[1].set_title("Side by side count")fig.tight_layout()fig.savefig(f"plot_profiling_side_by_side{suffix}.png")
Side by side duration, Side by side count

On CUDA

if"CUDAExecutionProvider"inget_available_providers():print("Profiling on CUDA")prof_base=ort_profile(filename,feeds,repeat=6,disable_optimization=True,providers=["CUDAExecutionProvider"],)prof_base.to_excel(f"prof_cuda_base{suffix}.xlsx",index=False)prof_opti=ort_profile(optimized,feeds,repeat=6,disable_optimization=True,providers=["CUDAExecutionProvider","CPUExecutionProvider"],)prof_opti.to_excel(f"prof_cuda_opti{suffix}.xlsx",index=False)unique_op=set(prof_base["args_op_name"])fig,ax=plt.subplots(2,2,figsize=(10,len(unique_op)),sharex="col")plot_ort_profile(prof_base,ax[0,0],ax[0,1],title="baseline")plot_ort_profile(prof_opti,ax[1,0],ax[1,1],title="optimized")fig.tight_layout()fig.savefig(f"plot_profiling_cuda{suffix}.png")merge,gr=merge_ort_profile(prof_base,prof_opti)merge.to_excel(f"plot_profiling_merged{suffix}.xlsx",index=False)gr.to_excel(f"plot_profiling_merged_details{suffix}.xlsx",index=False)grmax=gr["durbase"]+gr["duropti"]total=grmax.sum()grmax/=totalgr=gr[grmax>=0.01]fig,ax=plt.subplots(1,2,figsize=(14,min(gr.shape[0],500)),sharey=True)gr[["durbase","duropti"]].plot.barh(ax=ax[0])ax[0].set_title("Side by side duration")gr=gr.copy()gr[["countbase","countopti"]].plot.barh(ax=ax[1])ax[1].set_title("Side by side count")fig.tight_layout()fig.savefig(f"plot_profiling_side_by_side_cuda{suffix}.png")else:print(f"CUDA not available in{get_available_providers()}.")fig,ax=None,Noneax
  • baseline, n occurences, optimized, n occurences
  • Side by side duration, Side by side count
Profiling on CUDA~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7fc0d2399d80> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(~/github/onnx-array-api/onnx_array_api/ort/ort_profile.py:260: FutureWarning: The provided callable <function sum at 0x7fc0d2399d80> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.  .agg(array([<Axes: title={'center': 'Side by side duration'}, ylabel='label'>,       <Axes: title={'center': 'Side by side count'}, ylabel='label'>],      dtype=object)

Total running time of the script: (0 minutes 6.359 seconds)

Gallery generated by Sphinx-Gallery

On this page

[8]ページ先頭

©2009-2025 Movatter.jp