
Contents
More
Note
Go to the endto download the full example code.
onnxruntime optimizes the onnx graph by default before runningthe inference. It modifies, fuses or add new operators.Some of them are standard onnx operators, some of themare implemented in onnxruntime (seeSupported Operators).This example looks into the differences of two models.
importosfrompprintimportpprintimportnumpyfrompandasimportDataFrameimportmatplotlib.pyplotaspltfromonnximportloadfromonnx_array_api.ext_test_caseimportexample_pathfromonnx_array_api.plotting.text_plotimportonnx_simple_text_plotfromonnx_array_api.validation.diffimporttext_diff,html_difffromonnxruntimeimportGraphOptimizationLevel,InferenceSession,SessionOptionsfromonnx_array_api.ext_test_caseimportmeasure_timefromonnx_array_api.ort.ort_optimizersimportort_optimized_modelfilename=example_path("data/small.onnx")optimized=filename+".optimized.onnx"ifnotos.path.exists(optimized):ort_optimized_model(filename,output=optimized)print(optimized)
data/small.onnx.optimized.onnx
so=SessionOptions()so.graph_optimization_level=GraphOptimizationLevel.ORT_ENABLE_ALLimg=numpy.random.random((1,3,112,112)).astype(numpy.float32)sess=InferenceSession(filename,so,providers=["CPUExecutionProvider"])sess_opt=InferenceSession(optimized,so,providers=["CPUExecutionProvider"])input_name=sess.get_inputs()[0].nameout=sess.run(None,{input_name:img})[0]out_opt=sess_opt.run(None,{input_name:img})[0]ifout.shape!=out_opt.shape:print("ERROR shape are different{out.shape} !={out_opt.shape}")diff=numpy.abs(out-out_opt).max()print(f"Differences:{diff}")
Differences: 0.0
Unoptimized model.
first model to text...opset: domain='' version=11input: name='input' type=dtype('float32') shape=['None', 3, 112, 112]init: name='i0' type=float32 shape=(64,)init: name='i1' type=float32 shape=(64,)init: name='i2' type=float32 shape=(64,)init: name='i3' type=float32 shape=(64,)init: name='i4' type=float32 shape=(1, 2, 7, 7)init: name='i5' type=float32 shape=(64, 3, 3, 3)init: name='i6' type=float32 shape=(64,)init: name='i7' type=float32 shape=(64, 64, 3, 3)init: name='i8' type=float32 shape=(64,)init: name='i9' type=float32 shape=(64, 64, 3, 3)init: name='i10' type=float32 shape=(64,)init: name='i11' type=float32 shape=(64, 64, 1, 1)init: name='i12' type=float32 shape=(64,)init: name='i13' type=float32 shape=(64, 1, 1)init: name='i14' type=float32 shape=(64, 1, 1)Conv(input, i5, i6, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[1,1]) -> r0PRelu(r0, i13) -> r1ReduceMean(r1, axes=[1], keepdims=1) -> r2ReduceMax(r1, axes=[1], keepdims=1) -> r3Concat(r2, r3, axis=1) -> r4Conv(r4, i4, dilations=[1,1], group=1, kernel_shape=[7,7], pads=[3,3,3,3], strides=[1,1]) -> r5Sigmoid(r5) -> r6Mul(r6, r1) -> r7BatchNormalization(r7, i0, i1, i2, i3, epsilon=0.00, momentum=0.90) -> r8Conv(r8, i7, i8, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[1,1]) -> r9PRelu(r9, i14) -> r10Conv(r10, i9, i10, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[2,2]) -> r11Conv(r7, i11, i12, dilations=[1,1], group=1, kernel_shape=[1,1], pads=[0,0,0,0], strides=[2,2]) -> r12Add(r11, r12) -> onnx::BatchNormalization_1830output: name='onnx::BatchNormalization_1830' type=dtype('float32') shape=['None', 64, 56, 56]Optimized model.
second model to text...opset: domain='' version=11opset: domain='ai.onnx.ml' version=5opset: domain='ai.onnx.training' version=1opset: domain='ai.onnx.preview.training' version=1opset: domain='com.microsoft' version=1opset: domain='com.microsoft.experimental' version=1opset: domain='com.microsoft.nchwc' version=1opset: domain='org.pytorch.aten' version=1input: name='input' type=dtype('float32') shape=['None', 3, 112, 112]init: name='i0' type=float32 shape=(64,)init: name='i1' type=float32 shape=(64,)init: name='i2' type=float32 shape=(64,)init: name='i3' type=float32 shape=(64,)init: name='reorder_token_10' type=float32 shape=(64, 64, 3, 3)init: name='reorder_token_6' type=float32 shape=(64, 64, 3, 3)init: name='i6' type=float32 shape=(64,)init: name='reorder_token_1' type=float32 shape=(8, 2, 7, 7)init: name='i8' type=float32 shape=(64,)init: name='reorder' type=float32 shape=(64, 3, 3, 3)init: name='i10' type=float32 shape=(64,)init: name='reorder_token_3' type=float32 shape=(64, 64, 1, 1)init: name='i12' type=float32 shape=(64,)init: name='i13' type=float32 shape=(64, 1, 1)init: name='i14' type=float32 shape=(64, 1, 1)Conv[com.microsoft.nchwc](input, reorder, i6, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_0ReorderOutput[com.microsoft.nchwc](reorder_token_0, channels_last=0, channels=64) -> r0PRelu(r0, i13) -> r1ReduceMax(r1, keepdims=1, axes=[1]) -> r3ReduceMean(r1, keepdims=1, axes=[1]) -> r2Concat(r2, r3, axis=1) -> r4Conv[com.microsoft.nchwc](r4, reorder_token_1, activation=b'Sigmoid', auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[7,7], pads=[3,3,3,3]) -> reorder_token_2ReorderOutput[com.microsoft.nchwc](reorder_token_2, channels_last=0, channels=1) -> r6Mul(r6, r1) -> r7BatchNormalization(r7, i0, i1, i2, i3, momentum=0.90, epsilon=0.00) -> r8ReorderInput[com.microsoft.nchwc](r8, channels_last=0) -> reorder_token_7Conv[com.microsoft.nchwc](reorder_token_7, reorder_token_6, i8, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_9ReorderOutput[com.microsoft.nchwc](reorder_token_9, channels_last=0, channels=64) -> r9PRelu(r9, i14) -> r10ReorderInput[com.microsoft.nchwc](r10, channels_last=0) -> reorder_token_11ReorderInput[com.microsoft.nchwc](r7, channels_last=0) -> reorder_token_4Conv[com.microsoft.nchwc](reorder_token_4, reorder_token_3, i12, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[2,2], kernel_shape=[1,1], pads=[0,0,0,0]) -> reorder_token_5Conv[com.microsoft.nchwc](reorder_token_11, reorder_token_10, i10, reorder_token_5, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[2,2], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_13ReorderOutput[com.microsoft.nchwc](reorder_token_13, channels_last=0, channels=64) -> onnx::BatchNormalization_1830output: name='onnx::BatchNormalization_1830' type=dtype('float32') shape=['None', 64, 56, 56]Differences
differences... opset: domain='' version=11+ opset: domain='ai.onnx.ml' version=5+ opset: domain='ai.onnx.training' version=1+ opset: domain='ai.onnx.preview.training' version=1+ opset: domain='com.microsoft' version=1+ opset: domain='com.microsoft.experimental' version=1+ opset: domain='com.microsoft.nchwc' version=1+ opset: domain='org.pytorch.aten' version=1 input: name='input' type=dtype('float32') shape=['None', 3, 112, 112] init: name='i0' type=float32 shape=(64,) init: name='i1' type=float32 shape=(64,) init: name='i2' type=float32 shape=(64,) init: name='i3' type=float32 shape=(64,)- init: name='i4' type=float32 shape=(1, 2, 7, 7)? ^^ ^ ^ ^ ^+ init: name='reorder_token_10' type=float32 shape=(64, 64, 3, 3)? ^^^^^^^^^^^^^^^^ ^^ ^^ ^ ^- init: name='i5' type=float32 shape=(64, 3, 3, 3)? ^^ ^+ init: name='reorder_token_6' type=float32 shape=(64, 64, 3, 3)? ^^^^^^^^^^^^^^^ ^^ init: name='i6' type=float32 shape=(64,)- init: name='i7' type=float32 shape=(64, 64, 3, 3)? ^^ ^^ ^^ ^ ^+ init: name='reorder_token_1' type=float32 shape=(8, 2, 7, 7)? ^^^^^^^^^^^^^^^ ^ ^ ^ ^ init: name='i8' type=float32 shape=(64,)- init: name='i9' type=float32 shape=(64, 64, 3, 3)? ^^ ^^+ init: name='reorder' type=float32 shape=(64, 3, 3, 3)? ^^^^^^^ ^ init: name='i10' type=float32 shape=(64,)- init: name='i11' type=float32 shape=(64, 64, 1, 1)? ^^^+ init: name='reorder_token_3' type=float32 shape=(64, 64, 1, 1)? ^^^^^^^^^^^^^^^ init: name='i12' type=float32 shape=(64,) init: name='i13' type=float32 shape=(64, 1, 1) init: name='i14' type=float32 shape=(64, 1, 1)- Conv(input, i5, i6, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[1,1]) -> r0+ Conv[com.microsoft.nchwc](input, reorder, i6, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_0+ ReorderOutput[com.microsoft.nchwc](reorder_token_0, channels_last=0, channels=64) -> r0 PRelu(r0, i13) -> r1+ ReduceMax(r1, keepdims=1, axes=[1]) -> r3- ReduceMean(r1, axes=[1], keepdims=1) -> r2? ----------+ ReduceMean(r1, keepdims=1, axes=[1]) -> r2? ++++++++++- ReduceMax(r1, axes=[1], keepdims=1) -> r3 Concat(r2, r3, axis=1) -> r4- Conv(r4, i4, dilations=[1,1], group=1, kernel_shape=[7,7], pads=[3,3,3,3], strides=[1,1]) -> r5- Sigmoid(r5) -> r6+ Conv[com.microsoft.nchwc](r4, reorder_token_1, activation=b'Sigmoid', auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[7,7], pads=[3,3,3,3]) -> reorder_token_2+ ReorderOutput[com.microsoft.nchwc](reorder_token_2, channels_last=0, channels=1) -> r6 Mul(r6, r1) -> r7- BatchNormalization(r7, i0, i1, i2, i3, epsilon=0.00, momentum=0.90) -> r8? --------------+ BatchNormalization(r7, i0, i1, i2, i3, momentum=0.90, epsilon=0.00) -> r8? ++++++++++++++- Conv(r8, i7, i8, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[1,1]) -> r9+ ReorderInput[com.microsoft.nchwc](r8, channels_last=0) -> reorder_token_7+ Conv[com.microsoft.nchwc](reorder_token_7, reorder_token_6, i8, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[1,1], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_9+ ReorderOutput[com.microsoft.nchwc](reorder_token_9, channels_last=0, channels=64) -> r9 PRelu(r9, i14) -> r10- Conv(r10, i9, i10, dilations=[1,1], group=1, kernel_shape=[3,3], pads=[1,1,1,1], strides=[2,2]) -> r11- Conv(r7, i11, i12, dilations=[1,1], group=1, kernel_shape=[1,1], pads=[0,0,0,0], strides=[2,2]) -> r12- Add(r11, r12) -> onnx::BatchNormalization_1830+ ReorderInput[com.microsoft.nchwc](r10, channels_last=0) -> reorder_token_11+ ReorderInput[com.microsoft.nchwc](r7, channels_last=0) -> reorder_token_4+ Conv[com.microsoft.nchwc](reorder_token_4, reorder_token_3, i12, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[2,2], kernel_shape=[1,1], pads=[0,0,0,0]) -> reorder_token_5+ Conv[com.microsoft.nchwc](reorder_token_11, reorder_token_10, i10, reorder_token_5, auto_pad=b'NOTSET', dilations=[1,1], group=1, strides=[2,2], kernel_shape=[3,3], pads=[1,1,1,1]) -> reorder_token_13+ ReorderOutput[com.microsoft.nchwc](reorder_token_13, channels_last=0, channels=64) -> onnx::BatchNormalization_1830 output: name='onnx::BatchNormalization_1830' type=dtype('float32') shape=['None', 64, 56, 56]HTML version.
html differences...done.
img=numpy.random.random((1,3,112,112)).astype(numpy.float32)t1=measure_time(lambda:sess.run(None,{input_name:img}),repeat=25,number=25)t1["name"]="original"print("Original model")pprint(t1)t2=measure_time(lambda:sess_opt.run(None,{input_name:img}),repeat=25,number=25)t2["name"]="optimized"print("Optimized")pprint(t2)
Original model{'average': np.float64(0.0056790061488049106), 'context_size': 64, 'deviation': np.float64(0.0009349826756959479), 'max_exec': np.float64(0.008692911559919594), 'min_exec': np.float64(0.004763298119942192), 'name': 'original', 'number': 25, 'repeat': 25, 'ttime': np.float64(0.14197515372012276)}Optimized{'average': np.float64(0.005954185577592579), 'context_size': 64, 'deviation': np.float64(0.0014574969062307263), 'max_exec': np.float64(0.0113777117599966), 'min_exec': np.float64(0.0048157119199458975), 'name': 'optimized', 'number': 25, 'repeat': 25, 'ttime': np.float64(0.14885463943981447)}
| average | deviation | min_exec | max_exec | repeat | number | ttime | context_size | |
|---|---|---|---|---|---|---|---|---|
| name | ||||||||
| original | 0.005679 | 0.000935 | 0.004763 | 0.008693 | 25 | 25 | 0.141975 | 64 |
| optimized | 0.005954 | 0.001457 | 0.004816 | 0.011378 | 25 | 25 | 0.148855 | 64 |
And the graph is:
ax.bar(df.index,df["average"].values,yerr=df["deviation"].values,capsize=6)ax.set_title("Measure performance of optimized model\nlower is better")plt.grid()fig.savefig("plot_optimization.png")

Total running time of the script: (0 minutes 7.619 seconds)