Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize#1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ggerganov merged 17 commits intoggml-org:masterfromakashmjn:tdrz-integrate-1
Jul 4, 2023

Conversation

akashmjn
Copy link
Contributor

@akashmjnakashmjn commentedJun 27, 2023
edited by ggerganov
Loading

As discussed in#64, this PR adds experimental support for local diarization (marking of speaker turns) via integration of checkpoints from this projecthttps://github.com/akashmjn/tinydiarize/tree/main.

This is an early functional prototype done for thesmall.en models.

@ggerganov - this should be functionally done save for the last two points on the checklist, for which i'd appreciate some comments on the right way to expose this.

(also please excuse my C++ , I haven't written a lot of it, so this is heavily copilot-assisted 😉 )

Screenshot 2023-05-27 at 7 15 46 AM

Example usage

make./models/download-ggml-model.sh small.en-tdrzmake samples./main -m models/ggml-small.en-tdrz.bin -f samples/a13.wav

After running the above, you should see this:

Screenshot 2023-06-20 at 11 29 32 AM

JSON output contains an extraspeaker_turn_next field for each segment with this information.

Example JSON output
{"systeminfo": "AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | COREML = 0 | ","model": {"type": "small","multilingual": false,"vocab": 51864,"audio": {"ctx": 1500,"state": 768,"head": 12,"layer": 12},"text": {"ctx": 448,"state": 768,"head": 12,"layer": 12},"mels": 80,"ftype": 1},"params": {"model": "models/whisper-small.en.tdrz/ggml-small.en-tdrz.bin","language": "en","translate": false},"result": {"language": "en"},"transcription": [{"timestamps": {"from": "00:00:00,000","to": "00:00:03,800"},"offsets": {"from": 0,"to": 3800},"text": " Okay Houston, we've had a problem here. [SPEAKER TURN]""speaker_turn_next": true},                ...]}

Checklist:

Some terminology context for the last two points: this is technically not complete diarization yet, but speaker segmentationhttps://www.perplexity.ai/search/d01e6743-d2dc-4f5e-b5c2-2bf2212068f7?s=u (which can be thought of as local diarization).
Also technically the stereo audio input used by the current--diarize flag is already diarized (as it is separated into individual channels), so the naming isn't strictly consistent here either?

jhen0409, ggerganov, tylercaton, bcho, hdu214, brildev7, dnszero, z11h, lanceharper, and s4shstr reacted with thumbs up emojilin72h, kai-shimada, metabren, Leeviber, loretoparisi, pveugen, FSSRepo, lanceharper, and rben01 reacted with hooray emojiAudunVN reacted with heart emojiloretoparisi, gsilvan, jqtrde, giocomai, and gokeji reacted with rocket emoji
@akashmjnakashmjn changed the titlewhisper: support speaker segmentation (local diarization) of mono audio via integration of tinydiarizewhisper: support speaker segmentation (local diarization) of mono audio via tinydiarizeJun 27, 2023
@JianbangZ
Copy link

Does this support multi language or just English?

@skye-repos
Copy link

Excited! Will this support multiple speaker labelling or will it just mark speaker turns?

@akashmjn
Copy link
ContributorAuthor

akashmjn commentedJun 30, 2023
edited
Loading

Hi @Harith163 and@JianbangZ:

  • at the moment, just speaker turns and no clustering
  • this PR is merging a PoC done for thesmall.en models, so English-only

Both of these are doable I think, but are a little more involved and honestly depends on how the project evolves.

For multilingual - I think its easiest done by OpenAI themselves since ultimately that boils down to a reasonably multilingual finetuning dataset, and I'm pretty sure all released Whisper models had a final finetuning stage.

I'd say clustering has less dependencies and is a bit more tractable. I will sketch a rough plan for that once a few immediate things are done.

You can take a look at the immediate roadmap over athttps://github.com/akashmjn/tinydiarize/tree/main#roadmap.

@akashmjn
Copy link
ContributorAuthor

In fact@ggerganov I notice that you've already implemented C-means by hand in cpp here#130 😅 . Once I free up a little, I'll try running some clustering experiments over on the python repo.

In the meantime if you are interested, this is the best method out thereNME-SC:

@ggerganov
Copy link
Member

Yes :) Felt like doing some experiments (I cannot guarantee correctness of that implementation)

Btw, will be reviewing the PR over the weekend. Adding a diarization flag should be easy

jordibruin, lin72h, tylercaton, akashmjn, kyuumeitai, liquiddandruff, kanutron, and devXpranay reacted with heart emoji

@akashmjn
Copy link
ContributorAuthor

akashmjn commentedJul 2, 2023
edited
Loading

Yes :) Felt like doing some experiments (I cannot guarantee correctness of that implementation)

Btw, will be reviewing the PR over the weekend. Adding a diarization flag should be easy

Sounds good! For the last two points on my checklist - for now, i'll wait for your review. I've left//TODO@Akash at places where the behaviour needs to be toggled. If you find it more efficient - free to directly modify the PR however you find it best to expose this feature.

I think it should just be clear to the user that this is an experimental feature and requires using a specific*.tdrz checkpoint.

@ggerganov
Copy link
Member

I synced latestsggml fromllama.cpp and tomorrow will add the config option fortinydiarize and merge

akashmjn, lin72h, and LoggeL reacted with thumbs up emoji

@ohmguru
Copy link

Excited to see this PR merged. Noticed that this PR doesn't yet support the word-level timestamp flag. I wanted to flag that for consideration as Word level timestamps are quite helpful when building applications that show diarization output.

lin72h reacted with thumbs up emoji

@ggerganov
Copy link
Member

@akashmjn

This should be ready to merge now. Please take a look at my changes and let me know if you agree.
For now, lets leave the stereo "diairze" flag as it is - will rename it later to reflect what it actually does.

The most important change is that I addedtoken_tdrz and kepttoken_solm as it is.

Also, you now have to add the-tdrz flag to explicitly enable speaker turn detection even when usingtindiarize models.
The flag should not do anything if the model used is not atinydiarize one.

$ ./main -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.bin -tdrzmain: processing'./samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, tdrz = 1, timestamps = 1 ...[00:00:00.000 --> 00:00:03.800]   Okay Houston, we've had a problem here. [SPEAKER_TURN][00:00:03.800 --> 00:00:06.200]   This is Houston. Say again please. [SPEAKER_TURN][00:00:06.200 --> 00:00:08.260]   Uh Houston we've had a problem.[00:00:08.260 --> 00:00:11.320]   We've had a main beam up on a volt. [SPEAKER_TURN][00:00:11.320 --> 00:00:13.820]   Roger main beam interval. [SPEAKER_TURN][00:00:13.820 --> 00:00:15.100]   Uh uh [SPEAKER_TURN][00:00:15.100 --> 00:00:18.020]   So okay stand, by thirteen we're looking at it. [SPEAKER_TURN][00:00:18.020 --> 00:00:25.740]   Okay uh right now uh Houston the uh voltage is uh is looking good um.[00:00:27.620 --> 00:00:29.940]   And we had a a pretty large bank or so.

Here is without it:

$ ./main -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.binmain: processing'./samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...[00:00:00.000 --> 00:00:03.760]   Okay Houston, we've had a problem here.[00:00:03.760 --> 00:00:08.340]   Uh Houston we've had a problem.[00:00:08.340 --> 00:00:11.320]   We've had a main beam up on a volt.[00:00:11.320 --> 00:00:13.760]   Roger main beam interval.[00:00:13.760 --> 00:00:17.960]   So okay stand, by thirteen we're looking at it.[00:00:17.960 --> 00:00:25.740]   Okay uh right now uh Houston the uh voltage is uh is looking good um.[00:00:27.620 --> 00:00:29.940]   And we had a a pretty large bank or so.

Here is word-level timestamps with speaker turn detection:

$ ./main -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.bin -ml 1 -sow -tdrzmain: processing'./samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, tdrz = 1, timestamps = 1 ...[00:00:00.000 --> 00:00:00.060]  [00:00:00.060 --> 00:00:00.500]   Okay[00:00:00.500 --> 00:00:01.340]   Houston,[00:00:01.340 --> 00:00:01.850]   we've[00:00:01.850 --> 00:00:02.160]   had[00:00:02.160 --> 00:00:02.260]   a[00:00:02.260 --> 00:00:02.990]   problem[00:00:02.990 --> 00:00:03.800]   here. [SPEAKER_TURN][00:00:03.800 --> 00:00:04.030]   This[00:00:04.030 --> 00:00:04.140]   is[00:00:04.140 --> 00:00:04.710]   Houston.[00:00:04.710 --> 00:00:04.880]   Say[00:00:04.880 --> 00:00:05.170]   again[00:00:05.170 --> 00:00:06.200]   please. [SPEAKER_TURN][00:00:06.200 --> 00:00:06.340]   Uh[00:00:06.340 --> 00:00:06.850]   Houston[00:00:06.850 --> 00:00:07.210]   we've[00:00:07.210 --> 00:00:07.430]   had[00:00:07.430 --> 00:00:07.530]   a[00:00:07.530 --> 00:00:08.260]   problem.[00:00:08.260 --> 00:00:08.770]   We've[00:00:08.770 --> 00:00:09.080]   had[00:00:09.080 --> 00:00:09.180]   a[00:00:09.180 --> 00:00:09.610]   main[00:00:09.610 --> 00:00:10.000]   beam[00:00:10.000 --> 00:00:10.200]   up[00:00:10.200 --> 00:00:10.400]   on[00:00:10.400 --> 00:00:10.500]   a[00:00:10.500 --> 00:00:11.320]   volt. [SPEAKER_TURN][00:00:11.320 --> 00:00:11.840]   Roger[00:00:11.840 --> 00:00:12.250]   main[00:00:12.250 --> 00:00:12.740]   beam[00:00:12.740 --> 00:00:13.820]   interval. [SPEAKER_TURN][00:00:13.820 --> 00:00:15.080]   Uh[00:00:15.080 --> 00:00:15.100]   uh [SPEAKER_TURN][00:00:15.100 --> 00:00:15.230]   So[00:00:15.230 --> 00:00:15.500]   okay[00:00:15.500 --> 00:00:15.970]   stand,[00:00:15.970 --> 00:00:16.100]   by[00:00:16.100 --> 00:00:16.660]   thirteen[00:00:16.660 --> 00:00:16.980]   we're[00:00:16.980 --> 00:00:17.460]   looking[00:00:17.460 --> 00:00:17.610]   at[00:00:17.610 --> 00:00:18.020]   it. [SPEAKER_TURN][00:00:18.020 --> 00:00:18.570]   Okay[00:00:18.570 --> 00:00:18.840]   uh[00:00:18.840 --> 00:00:19.530]   right[00:00:19.530 --> 00:00:19.940]   now[00:00:19.940 --> 00:00:20.210]   uh[00:00:20.210 --> 00:00:21.170]   Houston[00:00:21.170 --> 00:00:21.580]   the[00:00:21.580 --> 00:00:21.850]   uh[00:00:21.850 --> 00:00:22.810]   voltage[00:00:22.810 --> 00:00:23.080]   is[00:00:23.080 --> 00:00:23.400]   uh[00:00:23.400 --> 00:00:23.730]   is[00:00:23.730 --> 00:00:24.810]   looking[00:00:24.810 --> 00:00:25.440]   good[00:00:25.440 --> 00:00:25.740]   um.[00:00:27.620 --> 00:00:27.670]  [00:00:27.670 --> 00:00:27.840]   And[00:00:27.840 --> 00:00:27.980]   we[00:00:27.980 --> 00:00:28.210]   had[00:00:28.210 --> 00:00:28.270]   a[00:00:28.270 --> 00:00:28.340]   a[00:00:28.340 --> 00:00:28.780]   pretty[00:00:28.780 --> 00:00:29.150]   large[00:00:29.150 --> 00:00:29.440]   bank[00:00:29.440 --> 00:00:29.580]   or[00:00:29.580 --> 00:00:29.940]   so.
akashmjn, ohmguru, and lin72h reacted with thumbs up emojiohmguru reacted with hooray emojiohmguru reacted with heart emojiohmguru and lin72h reacted with rocket emoji

Copy link
ContributorAuthor

@akashmjnakashmjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added some comments relating to some tricky token ID stuff

JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
JEF1056 added a commit to JEF1056/whisper.rn that referenced this pull requestSep 29, 2023
@tingyuchang
Copy link

@karolszafranski I think no need any special settings, settdrz_enable to true and you can get data fromwhisper_full_get_segment_speaker_turn_next in each segment

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull requestOct 24, 2023
…dio via tinydiarize (ggml-org#1058)* add HuggingFace mirror to download  ggml model* support tdrz via simple hack overriding solm tokens* fix incorrect translate/transcribe token_ids that are not static const* add apollo 13 sample for tdrz demo* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token* extend whisper_segment with speaker_turn_next field and save in json output* fix failing go build* slipped in some python syntax whoops* whisper : finalize tinydiarize support (add flag + fixes)* whisper : tdrz support for word-level timestamps (respect max_len)* java : try to fix tests after adding tdrz_enable flag* main : remove TODO leftover* java : fix params order list after adding "tdrz_enable"* whisper : fix solm and add nosp token* main : print tinydiarize help---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull requestOct 24, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull requestOct 24, 2023
…dio via tinydiarize (ggml-org#1058)* add HuggingFace mirror to download  ggml model* support tdrz via simple hack overriding solm tokens* fix incorrect translate/transcribe token_ids that are not static const* add apollo 13 sample for tdrz demo* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token* extend whisper_segment with speaker_turn_next field and save in json output* fix failing go build* slipped in some python syntax whoops* whisper : finalize tinydiarize support (add flag + fixes)* whisper : tdrz support for word-level timestamps (respect max_len)* java : try to fix tests after adding tdrz_enable flag* main : remove TODO leftover* java : fix params order list after adding "tdrz_enable"* whisper : fix solm and add nosp token* main : print tinydiarize help---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull requestOct 24, 2023
@khimaros
Copy link

i'm not sure if this is expected, but withmedium.en-q5_0, i'm seeing that speaker turns are pretty reliably marked with>>. i'm not using the--diarize or--tdrz flags.

i wasn't seeing this behavior withlarge-v2,large-v3, orlarge-v3-q5_0. any thoughts on why that would be happening?

zkvsky reacted with eyes emoji

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull requestDec 16, 2023
…dio via tinydiarize (ggml-org#1058)* add HuggingFace mirror to download  ggml model* support tdrz via simple hack overriding solm tokens* fix incorrect translate/transcribe token_ids that are not static const* add apollo 13 sample for tdrz demo* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token* extend whisper_segment with speaker_turn_next field and save in json output* fix failing go build* slipped in some python syntax whoops* whisper : finalize tinydiarize support (add flag + fixes)* whisper : tdrz support for word-level timestamps (respect max_len)* java : try to fix tests after adding tdrz_enable flag* main : remove TODO leftover* java : fix params order list after adding "tdrz_enable"* whisper : fix solm and add nosp token* main : print tinydiarize help---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull requestDec 16, 2023
@rben01
Copy link

Is there a way to use this with coreml models?

whisper_init_from_file_with_params_no_state: loading model from './models/ggml-small.en-tdrz.bin'whisper_model_load: loading modelwhisper_model_load: n_vocab       = 51864whisper_model_load: n_audio_ctx   = 1500whisper_model_load: n_audio_state = 768whisper_model_load: n_audio_head  = 12whisper_model_load: n_audio_layer = 12whisper_model_load: n_text_ctx    = 448whisper_model_load: n_text_state  = 768whisper_model_load: n_text_head   = 12whisper_model_load: n_text_layer  = 12whisper_model_load: n_mels        = 80whisper_model_load: ftype         = 1whisper_model_load: qntvr         = 0whisper_model_load: type          = 3 (small)whisper_model_load: adding 1608 extra tokenswhisper_model_load: n_langs       = 99whisper_backend_init: using Metal backendggml_metal_init: allocatingggml_metal_init: found device: Apple M2ggml_metal_init: picking default device: Apple M2ggml_metal_init: default.metallib not found, loading from sourceggml_metal_init: GGML_METAL_PATH_RESOURCES = nilggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwdggml_metal_init: loading 'ggml-metal.metal'ggml_metal_init: GPU name:   Apple M2ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)ggml_metal_init: hasUnifiedMemory              = trueggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MBggml_metal_init: maxTransferRate               = built-in GPUggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   464.64 MiB, (  466.27 / 10922.67)whisper_model_load:    Metal buffer size =   487.20 MBwhisper_model_load: model size    =  487.00 MBwhisper_backend_init: using Metal backendggml_metal_init: allocatingggml_metal_init: found device: Apple M2ggml_metal_init: picking default device: Apple M2ggml_metal_init: default.metallib not found, loading from sourceggml_metal_init: GGML_METAL_PATH_RESOURCES = nilggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwdggml_metal_init: loading 'ggml-metal.metal'ggml_metal_init: GPU name:   Apple M2ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)ggml_metal_init: hasUnifiedMemory              = trueggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MBggml_metal_init: maxTransferRate               = built-in GPUggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    47.25 MiB, (  513.52 / 10922.67)whisper_init_state: kv self size  =   49.55 MBggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    52.73 MiB, (  566.25 / 10922.67)whisper_init_state: kv cross size =   55.30 MBwhisper_init_state: loading Core ML model from './models/ggml-small.en-tdrz-encoder.mlmodelc'whisper_init_state: first run on a device may take a while ...whisper_init_state: failed to load Core ML model from './models/ggml-small.en-tdrz-encoder.mlmodelc'ggml_metal_free: deallocatingerror: failed to initialize whisper context

@zkvsky
Copy link

zkvsky commentedFeb 9, 2024
edited
Loading

i'm not sure if this is expected, but withmedium.en-q5_0, i'm seeing that speaker turns are pretty reliably marked with>>. i'm not using the--diarize or--tdrz flags.

i wasn't seeing this behavior withlarge-v2,large-v3, orlarge-v3-q5_0. any thoughts on why that would be happening?

It also happens with the small model, on its own or when pushed via ">>" prompt. Unfortunately, for the life of me I cannot combine it with my other prompt which resulted with proper quote-unquote behavior., i.e.

Knock on the door and I had to be like, "Oh my God, please, is there anybody in there?"And she was like, "Okay, let's see how this goes"

And quotes only happen when using -oved GPU [unfortunately it hallucinates a lot], where -oved CPU is much likely to trigger ">>" diarizations on its own.
This is so weird...

@kuro337
Copy link

Hello!

I was wondering - how does the integration work with the./server?

because I was running - it through the binary and from the server - and it seemed the diarization output was missing.

Example:

./main -f ../audio/multi.wav -m ./models/ggml-small.en-tdrz.bin -tdrz --print-colors# output[00:00:00.080 --> 00:00:04.820]   Let's go down. So your sister's going off How. old is she? [SPEAKER_TURN][00:00:04.820 --> 00:00:08.620]   She's twenty five. [SPEAKER_TURN][00:00:08.620 --> 00:00:12.560]   Alright. And is she going to go to do a job or is she's gonna travel? [SPEAKER_TURN][00:00:12.560 --> 00:00:19.940]   Um she's going to work when she's there anddo like bits ofjobs andthen move around at the same time. [SPEAKER_TURN][00:00:19.940 --> 00:00:22.520]   So is she's goin

Same example using the server

./server -m models/ggml-small.en-tdrz.bin -tdrz -pc -debug curl 127.0.0.1:8080/inference \-H"Content-Type: multipart/form-data" \-F file="@../audio/multi.wav" \-F response_format="json" \-F tinydiarize=true

Output

{"text":" Mm. Okay.\n So your sister's going off How. old is she?\n She's twenty five.\n Alright. And is she going to go to do a job or is she's gonna travel?\n Um she's going to work when she's there and do like bits of jobs and then move around at the same time.\n So she's going straight to Australia?\n Um no first she's going to Thailand.\n And then she's going to Australia.\n And then move somewhere and then in America.\n Brilliant. So if she's bought one of these year t tickets you can go around the world f in a year or something Is. that what she's done with these airline tickets yeah, Yeah? So would you like to travel?\n Yeah.\n Mm-hmm.\n That's a good a reason though. Yeah. Actually I think it probably is because I mean I know it sounds straight forward but you can sort of add E_s and A_s and things on the end of things and it normally sounds right anyway. We've got a Spanish girl working with us at the moment so, So this is a a two year course now is, it G_C_S_E_s?\n Yeah. It's from year ten to year E_ el" }

Tried all of these formats and the same thing -

json | text | srt | verbose_json | vtt

No issues if it is not supported - was just wondering if it was possible because the docs mentioned we can pass -tdrz to the server so was wondering if I was doing anything wrong!

Cheers

@shoryamalani
Copy link

Hey guys, is this functionality coming to the larger models or could we compile it ourselves?

Thank you so much

joseluis, harlantwood, jfima, 7k50, psimm, robinnewhouse, sjh-asw, JJ-Atkinson, seriousm4x, and satvikpendem reacted with thumbs up emoji

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull requestSep 23, 2024
…dio via tinydiarize (ggml-org#1058)* add HuggingFace mirror to download  ggml model* support tdrz via simple hack overriding solm tokens* fix incorrect translate/transcribe token_ids that are not static const* add apollo 13 sample for tdrz demo* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token* extend whisper_segment with speaker_turn_next field and save in json output* fix failing go build* slipped in some python syntax whoops* whisper : finalize tinydiarize support (add flag + fixes)* whisper : tdrz support for word-level timestamps (respect max_len)* java : try to fix tests after adding tdrz_enable flag* main : remove TODO leftover* java : fix params order list after adding "tdrz_enable"* whisper : fix solm and add nosp token* main : print tinydiarize help---------Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull requestSep 23, 2024
lyapple2008 pushed a commit to lyapple2008/whisper.cpp.mars that referenced this pull requestFeb 4, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ggerganovggerganovggerganov approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

13 participants
@akashmjn@JianbangZ@skye-repos@ggerganov@ohmguru@gotjoshua@karolszafranski@tingyuchang@khimaros@rben01@zkvsky@kuro337@shoryamalani

[8]ページ先頭

©2009-2025 Movatter.jp