NotificationsYou must be signed in to change notification settings
Fork4.5k
Star41.6k

OpenVINO support#1037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

ggerganov merged 10 commits intoggml-org:masterfromRyanMetcalfeInt8:openvino_integration

Jul 4, 2023

Merged

OpenVINO support#1037

ggerganov merged 10 commits intoggml-org:masterfromRyanMetcalfeInt8:openvino_integration

Jul 4, 2023

Conversation

Copy link

Contributor

RyanMetcalfeInt8 commentedJun 22, 2023•
edited
Loading

Running Whisper inference using OpenVINO

This PR extendswhisper.cpp to run the Whisper Encoder onOpenVINO supported devices such as CPU, and Intel GPUs (integrated & discrete).

I've tested this on number of platforms, including

NUC Intel(R) Core(TM) i7-6770HQ ('Skylake' Skull Canyon NUC) running Ubuntu 22.04
Core(TM) i7-1185G7 ('Tiger Lake' laptop) running Windows 11 Pro
Core(TM) i7-12700 ('Alder Lake' Beast Canyon NUC) with installed Intel(R) ARC(TM) A770 discrete graphics card, running Windows 11 Pro

For each platform, the performance of using OpenVINO-based encoder gives a great boost in performance over the default encoder -- even for CPU -- and the ability to easily offload to another OpenVINO-supported device by simply specifying a different string at runtime (e.g. "CPU" --> "GPU") is very convenient.

High-level description of changes

This introduction of OpenVINO Encode support is modeled very closely to how whisper.cpp uses CoreML (this should be pretty obvious in the change-set). If the project is built with OpenVINO support, an OpenVINO-specific encoder is pulled into the build and instantiated at application startup time.

Also similar to CoreML, the models required to be present to take advantage of the OpenVINO encoder can be generated using a new python script in 'models' directory.

Just to point out -- something that does differ between CoreML and the new OpenVINO integration is how/when support is enabled at runtime. CoreML is enabled within the call towhisper_init_*. For OpenVINO, because we want the ability to specify a device string ("CPU", "GPU", etc.), I exposed a new API that is dedicated to initializing OpenVINO, given a ctx:

(in whisper.h):

// Given a context, enable use of OpenVINO for encode inference.// openvino_model_path: Optional path to OpenVINO encoder IR model. If set to nullptr,//                      the path will be generated from the ggml model path that was passed//                      in to whisper_init_from_file. For example, if 'path_model' was//                      "/path/to/ggml-base.en.bin", then OpenVINO IR model path will be//                      assumed to be "/path/to/ggml-base.en-encoder-openvino.xml".// openvino_device: OpenVINO device to run inference on ("CPU", "GPU", etc.)// openvino_cache_dir: Optional cache directory that can speed up init time, especially for//                     GPU, by caching compiled 'blobs' there.//                     Set to nullptr if not used.// Returns 1 on success. If OpenVINO is not enabled in build, this// simply returns 0.WHISPER_APIintwhisper_ctx_init_openvino_encoder(structwhisper_context*ctx,constchar*openvino_model_path,constchar*openvino_device,constchar*openvino_cache_dir);

I'm happy to rework this if anyone has a better idea of how to enable OpenVINO support at init time.

main.cpp exposes a new parameter for user to set OpenVINO encode inference device (default is "CPU"):

...elseif (arg=="-oved"||arg=="--ov-e-device")    {params.openvino_encode_device=argv[++i]; }...

And the newwhisper_ctx_init_openvino_encoder API is called right after ctx creation:

// whisper initstructwhisper_context*ctx=whisper_init_from_file(params.model.c_str());if (ctx==nullptr) {fprintf(stderr,"error: failed to initialize whisper context\n");return3;    }// initialize openvino encoder. This has no effect on whisper.cpp builds that don't have OpenVINO configured.whisper_ctx_init_openvino_encoder(ctx,nullptr,params.openvino_encode_device.c_str(),nullptr);

How to generate models and enable OpenVINO for whisper.cpp builds

Here are the instructions for generating the OpenVINO models for use with OpenVINO-enabled builds of whisper.cpp:

First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.

Windows:

cd modelspython -m venv openvino_conv_envopenvino_conv_env\Scripts\activatepython -m pip install --upgrade pippip install -r openvino-conversion-requirements.txt

Linux and macOS:

cd modelspython3 -m venv openvino_conv_envsource openvino_conv_env/bin/activatepython -m pip install --upgrade pippip install -r openvino-conversion-requirements.txt

Generate an OpenVINO encoder model. For example, to generate abase.en model, use:
```
python convert-whisper-to-openvino.py --model base.en
```
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that is the default location that the OpenVINO extension will search at runtime.
Buildwhisper.cpp with OpenVINO support:
Download OpenVINO package fromrelease page. The recommended version to use is2023.0.0.
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
Linux:
```
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
```
Windows (cmd):
```
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
```
And then build the project using cmake:
```
cd buildcmake -DWHISPER_OPENVINO=1 ..
```

Run the examples as usual. For example:

./main -m models/ggml-base.en.bin -f samples/jfk.wav...whisper_ctx_init_openvino_encoder: loading OpenVINO model from'models/ggml-base.en-encoder-openvino.xml'whisper_ctx_init_openvino_encoder: first run on a device may take awhile ...whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = CPU, cache_dir = models/ggml-base.en-encoder-openvino-cachewhisper_ctx_init_openvino_encoder: OpenVINO model loadedsystem_info: n_threads = 4 / 8| AVX = 1| AVX2 = 1| AVX512 = 0| FMA = 1| NEON = 0| ARM_FMA = 0| F16C = 1| FP16_VA = 0| WASM_SIMD = 0| BLAS = 0| SSE3 = 1| VSX = 0| COREML = 0| OPENVINO = 1|...

The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
cached for the next run.

You can use -oved [DEVICE] argument to main to specify OpenVINO device to offload encoder inference to. For example:

main -m ggml-base.bin -f gb1.wav -oved GPU

RyanMetcalfeInt8 added3 commits

June 21, 2023 16:21

openvino: use OpenVINO encoder inference

c352893

openvino: add python script for OpenVINO model generation

93b8be4

whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build

58eae32

Copy link

Member

ggerganov commentedJun 25, 2023

Wow - this is quite interesting. First time I hear about OpenVINO - will try to get familiar.
I'll need to look into more details, but overall the implementation looks very nice. Very good PR description

Just to point out -- something that does differ between CoreML and the new OpenVINO integration is how/when support is enabled at runtime. CoreML is enabled within the call to whisper_init_*. For OpenVINO, because we want the ability to specify a device string ("CPU", "GPU", etc.), I exposed a new API that is dedicated to initializing OpenVINO, given a ctx:

Do you think it makes sense to do the same for Core ML so that the implementations follow similar pattern?

Copy link

ContributorAuthor

RyanMetcalfeInt8 commentedJun 25, 2023

@ggerganov, thanks for taking a look!

Do you think it makes sense to do the same for Core ML so that the implementations follow similar pattern?

I think that makes sense, especially if CoreML exposes parameters to control how inference is performed -- but to be honest I know very little about CoreML.

ggerganov requested changes

Jun 28, 2023

View reviewed changes

Copy link

Member

ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Minor changes - should be good to merge after that

whisper.h OutdatedShow resolvedHide resolved

whisper.cppShow resolvedHide resolved

whisper.cpp OutdatedShow resolvedHide resolved

CMakeLists.txt Outdated

		@@ -310,6 +321,7 @@ add_library(${TARGET}
		${GGML_OPENCL_SOURCES}
		whisper.h
		whisper.cpp
		${OpenVINO_SOURCES}

Copy link

Member

ggerganovJun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

UseOPENVINO_SOURCES

However, why not make a separate targetwhisper.openvino similar to howwhisper.coreml works?

Copy link

ContributorAuthor

RyanMetcalfeInt8Jun 28, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let me try it again. I had originally tried to add it as a separate target and had some weird issues (something like the corresponding .Lib wasn't being generated in Windows build)-- I intended to circle back though, so thanks for the reminder.

Copy link

ContributorAuthor

RyanMetcalfeInt8Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

okay, see latest commit (76c4186)

I added openvino-encoder to dedicated OBJECT target:

add_library(${TARGET} OBJECT        openvino/whisper-openvino-encoder.h        openvino/whisper-openvino-encoder.cpp        )

And this target is linked to whisper just like coreml:

if (WHISPER_OPENVINO)    target_link_libraries(${TARGET} PRIVATE whisper.openvino)endif()

I was thinking of making it SHARED, but I think it'd be more of a hassle to have to carry around a separate .dll / .so..

This builds fine, and did some minimal testing on Windows 11 & Ubuntu.

RyanMetcalfeInt8and others added2 commits

June 28, 2023 15:18

Apply suggestions from code review

4bc1ebc

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

whisper: Fix compilation error

6bfa371

ggerganov reviewed

Jun 28, 2023

View reviewed changes

whisper.cpp OutdatedShow resolvedHide resolved

RyanMetcalfeInt8 added2 commits

June 28, 2023 16:13

whisper: revert whisper_get_openvino_path_encoder & whisper_get_openv…

df77368

…ino_path_cache to non-const func signatures

cmake: Add openvino-encoder as separate object target

76c4186

RyanMetcalfeInt8 requested a review fromggerganov

June 29, 2023 15:37

ggerganov added3 commits

July 4, 2023 15:37

whisper : minor style fixes

bc5746e

Merge branch 'master' into openvino_integration

0ed471c

minor : indentation fixes

df98287

ggerganov approved these changes

Jul 4, 2023

View reviewed changes

Copy link

Member

ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great stuff 👍

ggerganov merged commit62b8127 intoggml-org:master

Jul 4, 2023

ggerganov added a commit that referenced this pull request

Jul 4, 2023

whisper : minor OpenVINO refactoring (#1037)

4774d2f

Hopefully I didn't break something - haven't tested

RyanMetcalfeInt8 mentioned this pull request

Jul 16, 2023

README.md: Add OpenVINO support details#1112

Merged

Copy link

Nabaralathep commentedJul 28, 2023

Hi!
in the OpenVino instructions there is the next sentence
cd build
cmake -DWHISPER_OPENVINO=1 ..
where is that "build" dir?

And when I run:
./main -m models/ggml-base.en.bin -f samples/jfk.wav
I don't see the "OPENVINO = 1" or any other info about loading openvino

All the other instructions was executed with success
What is missing?

Distro info
I am running on parrot OS 5.3

Amazing work, and thanks for sharing.

Copy link

ContributorAuthor

RyanMetcalfeInt8 commentedJul 29, 2023•
edited
Loading

Hi@Nabaralathep,

Looks like I forgot themkdir build, so it should be:

mkdir buildcd buildcmake -DWHISPER_OPENVINO=1 ..make

Let me know how it goes.

Copy link

Nabaralathep commentedJul 30, 2023

Hi@RyanMetcalfeInt8,
Thank you very much for your help! You pulled me out of a hole, but... I had some issues that I would like to share to help someone else get out of the hole as well.

1.When I runcmake -DWHISPER_OPENVINO=1 .. the build files are created in the back folder and not in build, maybe because of my cmake version (3.18.4), I solved this withcmake -DWHISPER_OPENVINO=1 .. -B build, all being in the "openvino_conv_env" folder.

2.When I ran make I received an error, it turns out that I had the debian arm version and my computer is x86_64, but when I went to the repository to download the appropriate one, I discovered that all the packages for debian are arm so what?

So this is a dead end, and I'm going to install the pink windows called ubuntu.

In any case, I really appreciate (you don't know how much) your answer, since at least it made me understand the problem, thank you very much and your work is incredible.

Copy link

Contributor

tazz4843 commentedAug 29, 2023•
edited
Loading

Does this implementation of OpenVINO support the GNA in 10th to 14th generation Intel CPUs? Intel advertises it as follows:

Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.
When power and performance are critical, the Intel® Gaussian & Neural Accelerator (Intel® GNA) provides power-efficient, always-on support. Intel® GNA is designed to deliver AI speech and audio applications such as neural noise cancellation, while simultaneously freeing up CPU resources for overall system performance and responsiveness.

They also later mention it could be used for tasks such as speech-to-text, and I'm curious if/how well whisper would perform on it.

Setting the OpenVINO device to "gna" just throws an error with assertion failed

whisper_ctx_init_openvino_encoder: loading OpenVINO model from '../../models/ggml-base-encoder-openvino.xml'whisper_ctx_init_openvino_encoder: first run on a device may take a while ...whisper_openvino_init: path_model = ../../models/ggml-base-encoder-openvino.xml, device = GNA, cache_dir = ../../models/ggml-base-encoder-openvino-cachein openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:114:[ GENERAL_ERROR ]  AssertionFailed: split_sizes.size() > 1whisper_ctx_init_openvino_encoder: failed to init OpenVINO encoder from '../../models/ggml-base-encoder-openvino.xml'

Copy link

ilya-lavrenov commentedOct 24, 2023•
edited
Loading

2.When I ran make I received an error, it turns out that I had the debian arm version and my computer is x86_64, but when I went to the repository to download the appropriate one, I discovered that all the packages for debian are arm so what? !

OpenVINO Ubuntu packages are compatible with Debian OS. You can use OpenVINO archives as well as install via apt and Debian packages.

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request

Oct 24, 2023

whisper : add OpenVINO support (ggml-org#1037)

5253065

* openvino: use OpenVINO encoder inference* openvino: add python script for OpenVINO model generation* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build* Apply suggestions from code reviewCo-authored-by: Georgi Gerganov <ggerganov@gmail.com>* whisper: Fix compilation error* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures* cmake: Add openvino-encoder as separate object target* whisper : minor style fixes* minor : indentation fixes---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request

Oct 24, 2023

whisper : minor OpenVINO refactoring (ggml-org#1037)

4611479

Hopefully I didn't break something - haven't tested

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request

Oct 24, 2023

whisper : add OpenVINO support (ggml-org#1037)

4a11426

* openvino: use OpenVINO encoder inference* openvino: add python script for OpenVINO model generation* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build* Apply suggestions from code reviewCo-authored-by: Georgi Gerganov <ggerganov@gmail.com>* whisper: Fix compilation error* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures* cmake: Add openvino-encoder as separate object target* whisper : minor style fixes* minor : indentation fixes---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request

Oct 24, 2023

whisper : minor OpenVINO refactoring (ggml-org#1037)

e266996

Hopefully I didn't break something - haven't tested

ilya-lavrenov reviewed

Oct 26, 2023

View reviewed changes

models/convert-whisper-to-openvino.py

		onnx_path,
		input_names=["mel"],
		output_names=["output_features"]
		)

Copy link

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

it's not required to export to ONNX before usage in OpenVINO.
You can use convert_model with PyTorch in-memory objecthttps://docs.openvino.ai/2023.1/openvino_docs_OV_Converter_UG_prepare_model_convert_model_Convert_Model_From_PyTorch.html

models/openvino-conversion-requirements.txt

		@@ -0,0 +1,2 @@
		openvino-dev[pytorch,onnx]

Copy link

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we can useopenvino>=2023.1.0 which contains update version ofconvert_model directly in mainopenvino pip package, whileopenvino-dev is actually deprecated.

openvino/whisper-openvino-encoder.cpp

		std::shared_ptr<ov::Model> model = core.read_model(path_model);

		// Produce a compiled-model object, given the device ("CPU", "GPU", etc.)
		auto compiledModel = core.compile_model(model, device);

Copy link

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

you can passpath_model directly tocompile_model, which can speed-up loading withov::cache_dir enabled. Seehttps://docs.openvino.ai/2023.1/openvino_docs_OV_UG_Model_caching_overview.html#make-it-even-faster-use-compile-model-modelpath

Copy link

pukerpandaDec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any practical speedup from this change?

I'm on OpenVINO 2022.3.1 for device which is EOL'ed. I can compile master and run it with cache:

whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = MYRIAD, cache_dir = models/ggml-base.en-encoder-openvino-cache

The speed is on par with CPU/GPU OpenVINO. And it helps RPi to inference on base model.

Copy link

ContributorAuthor

RyanMetcalfeInt8Dec 20, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Probably some yes, but the speedup will be during initialization (i.e. the time it takes to pull the model / cached blob from disk and prep the execution device).

Copy link

ContributorAuthor

RyanMetcalfeInt8 commentedOct 30, 2023

@ilya-lavrenov -- good suggestions, looks like OpenVINO made some nice improvements for 2023.1+. Did you want to submit a PR with the updates / fixes?

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request

Dec 16, 2023

whisper : add OpenVINO support (ggml-org#1037)

0dc95d5

* openvino: use OpenVINO encoder inference* openvino: add python script for OpenVINO model generation* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build* Apply suggestions from code reviewCo-authored-by: Georgi Gerganov <ggerganov@gmail.com>* whisper: Fix compilation error* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures* cmake: Add openvino-encoder as separate object target* whisper : minor style fixes* minor : indentation fixes---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request

Dec 16, 2023

whisper : minor OpenVINO refactoring (ggml-org#1037)

7fc7ac6

Hopefully I didn't break something - haven't tested

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request

Sep 23, 2024

whisper : add OpenVINO support (ggml-org#1037)

bf72778

* openvino: use OpenVINO encoder inference* openvino: add python script for OpenVINO model generation* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build* Apply suggestions from code reviewCo-authored-by: Georgi Gerganov <ggerganov@gmail.com>* whisper: Fix compilation error* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures* cmake: Add openvino-encoder as separate object target* whisper : minor style fixes* minor : indentation fixes---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request

Sep 23, 2024

whisper : minor OpenVINO refactoring (ggml-org#1037)

f5956e6

Hopefully I didn't break something - haven't tested

pramodbiligiri mentioned this pull request

Feb 5, 2025

The main README is missing the "oved" option in OpenVino section#2792

Open

Labels

None yet

6 participants

Movatterモバイル変換

OpenVINO support#1037

OpenVINO support#1037

Uh oh!

Conversation

RyanMetcalfeInt8 commentedJun 22, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Running Whisper inference using OpenVINO

High-level description of changes

How to generate models and enable OpenVINO for whisper.cpp builds

Uh oh!

ggerganov commentedJun 25, 2023

Uh oh!

RyanMetcalfeInt8 commentedJun 25, 2023

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganovJun 28, 2023

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8Jun 28, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Nabaralathep commentedJul 28, 2023

Uh oh!

RyanMetcalfeInt8 commentedJul 29, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Nabaralathep commentedJul 30, 2023

Uh oh!

tazz4843 commentedAug 29, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ilya-lavrenov commentedOct 24, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

Uh oh!

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

Uh oh!

ilya-lavrenovOct 26, 2023

Choose a reason for hiding this comment

Uh oh!

pukerpandaDec 20, 2023

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8Dec 20, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8 commentedOct 30, 2023

Uh oh!

Uh oh!

RyanMetcalfeInt8 commentedJun 22, 2023•
edited
Loading

RyanMetcalfeInt8Jun 28, 2023•
edited
Loading

RyanMetcalfeInt8 commentedJul 29, 2023•
edited
Loading

tazz4843 commentedAug 29, 2023•
edited
Loading

ilya-lavrenov commentedOct 24, 2023•
edited
Loading

RyanMetcalfeInt8Dec 20, 2023•
edited
Loading