Media community:Audio and video requests ·Featured media (candidates) ·Media help ·Media of the Day ·Timed Text ·Video info ·Video2commons–Upload ·Video cut tool
For other uses, you may be looking forCommons:File captions.
TimedText is a customWikimedia Commons namespace to hold closed captioning text, or subtitles, to be associated with other media, such as audio or video files. This page intends to explain the feature's concept and use.
Closed captioning (CC) and subtitling are both processes of displaying text on a television, video screen, or other visual display to provide additional or interpretive information. Both are typically used as a transcription of the audio portion of a program as it occurs (either verbatim or in edited form), sometimes including descriptions of non-speech elements. This aids hearing-impaired and deaf people and provides a way for non-native language speakers to understand the content in a multimedia file.
Also seeCommons:Video#Subtitles and closed captioning.
Thumbnails of videos and audio clips that have closed captioning available will show the CC icon overlayed.After opening the player, subtitles in your language are automatically enabled.You can find the icon in the controls of the player to switch between languages, toggling subtitles on and off, or to change the formatting of the subtitles.
Timed Text can be used for any media that is presented in a time sequence:
Actual examples
TimedText:
prefix, add the text after it, e.g.TimedText:Elephants_Dream.ogv
).TimedText:Elephants_Dream.ogv.en.srt
) to create a TimedText page - seeCommons:Timed TextCommons needs a means to find Timed Text files for specific languages; the following suffer from the Search function's limitations (such as: it does not show all matches; it includes non-matches; it needs regular expression support).Search including some Timed Text .srt files in different languages:
English •German •French •Portuguese •Russian •Swedish •Ukrainian •Polish •Indonesian
Other methods to help user find Timed Text:
{{special|Prefixindex/TimedText:{{PAGENAME}}.|stripprefix|1|subtitles}}
yields a link to all related Timed Text files (example).The template{{Captions requested}} can be used to mark that a video needs caption. The template add it to the categoryVideos needing subtitles, so one can see which videos, users or authors have requested transcripts.
This template and category is in the scope ofCommons:WikiProject Deaf and its sistersmeta:Deaf Wikimedians andWikipedia:WikiProject Deaf.
One way to find such videos, is to open one of the subcategories ofCategory:Files with closed captioning depending on the preferred starting language, and then to useHelp:FastCCI (on the top right of the page) to include only the videos that don't have subtitles for your preferred target language.
Files with closed captioning in German
Files with closed captioning in French
Files with closed captioning in Russian
etc..
TheTimedText talk namespace is for discussing the respective Timed Text pages, but it could also be used to link and categorize the Timed Text page.
To upload an already created subtitle file, open the file on your computer in a text editor (such asNotepad) and copy the text into a new page in the TimedText namespace that matches the filename of the video and the language code.
Commons uses theSubRip (.srt) file format for closed captioning and subtitles. You can create these files in multiple ways.
Option 1: in the Commons page of the file (recommended)
You can use the "TimedText" link at the top of any suitable multimedia file on Commons.
Option 2: directly in the media player
By using theCC button in the toolbar of the WikimediaHTML5 media player, you can select subtitles if they are available, or open the Subtitles editor to create subtitles for the video.
Option 3: creating a blank page (for advanced users)
You can always directly create the page in Commons using the template TimedText:[Common_File_Name.extension].[language].srt, where [Common_File_Name.extension] is the name of the file, and [language] is the ISO code for the language.
Example: to add subtitles toElephants_Dream.ogg
, you can create the pageTimedText:Elephants_Dream.ogg.en.srt
for english subtitles, orTimedText:Elephants_Dream.ogg.fr.srt
for french subtitles.
To copy existing subtitles from a DVD you can use software such asSubRip. You can then copy-paste them in the wiki Commons subtitle page.
YouTube allows users with a YouTube account to create subtitles out of any uploaded file. Keep in mind the speech recognition is automated and produces unexpected results. It is preferable toupload a transcript of the file to YouTube. This will provide a much better result. You can then copy-paste them in the wiki Commons subtitle page.
Steps to create the subtitles (a video tutorial of the steps can be foundhere):
You can download subtitles from video on YouTube (and probably several other video websites) like so:
yt-dlp --list-subs url
(replace url with the youtube url)yt-dlp --write-subs en --sub-format vtt url
(replace url with the YouTube URL)If you use the toolvideo2commons one can check "Import subtitles" but that does not work for vtt subtitles (phab:T368298) so for these videos you also need to do the above steps for importing subtitles.
YouTube auto generated subtitles are scrolling captions. I wrotea program that converts these to block captions so they can be put on Commons. First, download the video withyt-dlp --write-auto-subs url
(replaceurl
with the url, well, duh). Then, use option 3. It should work okay but it has a habit of putting "word. word," at the end of a block, which is just so wrong because a full stop should be a good time to end a block. But the code is really long and I think I would have lots of trouble fixing it now.
You can use the open source toolSoniTranslate to more easily and quickly generate machine transcribed subtitles.It would be good if you check these, especially if you also use the tool for machine translation into other languages.For example it may output years as long texts instead of numbers or get people's names wrong.How to use this tool is described inHelp:AI video dubbing.[1]If there are no existing subtitles to import, this is likely the fastest way to add TimedTexts.Transcription usually only takes only a few seconds even if you don't have a GPU, depending on how long the video is.
The timings are made so that they are well-suited for getting used for dubbing videos into other languages which often is not the case for manually-made subtitles. You can edit the subtitles, then save as srt file and use that as input to the tool to let it create an audio or subtitle in another language.
as of 2024[update], theWhisper AI models[1] are the most advanced speech transcription models available and can be run locally, either using Python orwhisper.cpp.Unlike the earlier Vosk models, they will also produce punctuation, bringing their output much closer to a high-quality human transcription. All the same, you should check AI-generated subtitles against the video and correct mistakes, add punctuation, check correct spelling of people and place names, check facts and figures, etc. AI subtitles are very useful as a first draft, but often also contain some silly mistakes a human transcriber would not have made.
An advantage of whisper.cpp is that it is particularly optimized for running on the CPU rather than the GPU (so it is especially useful if you have an AMD graphics card and therefore no CUDA).But CUDA and Metal (on a Mac) are also supported, therefore it can easily adapt to different hardware configurations. Another advantage is that it does not require installing any external dependencies, i.e. no Python or PyTorch, since it is written in C++, making it a much smaller download than a Python machine learning environment.
Some video editing and closed captioning GUI software now features built-in Whisper functionality: Open source examples include the video editorKdenlive (since version 23.04; requires Python) andSubtitle Edit (either Python or C++ can be used to run Whisper models).
But running the command-line version of whisper.cpp directly to create an SRT file is not too difficult either, provided your operating system has a C compiler, make, etc. to compile it with:
First, use e.g. ffmpeg to extract a video's audio track and convert it to 16 kHz sample rate:
ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
Next, compile whisper.cpp and download a model (the base model optimized for English content is about 140 MB; "medium" can also handle other languages and is about 1.5 GB) and then start the conversion with e.g.:
./main -m models/ggml-base.en.bin -f audio.wav -t 8 -pc -osrt
This will use 8 CPU cores and create an SRT file calledaudio.wav.srt
in the same directory. During recognition, words will be color-coded by confidence (green = very certain, red = very uncertain), so you can quickly see if the model is having trouble. If a smaller model delivers unusable output, you can try a larger model, e.g. medium, which will be slower but produce better results.
You can also translate from other languages, e.g. adding"-l fr -tr"
to the options will translate French audio to English.
If you export the SBV format from YouTube subtitles you can useffmpeg to convert the subtile file to the SRT (SubRip) format used by Commons. This feature also solves the overlap issue that is common when converting YouTube subtitles to Commons.
ffmpeg -fix_sub_duration -iinput.sbvoutput.srt
This section describes how to convert XML YouTube subtitles to SubRip (srt) format, that is TimedText subtitles format used in Wikimedia Commons.
If
Then:
Type what is said in SRT format, this is one subtitle block:
100:00:20,000 --> 00:00:24,400Words here.Also get a caption editor.
This is two:
100:00:20,000 --> 00:00:21,500Words more words.200:00:21,500 --> 00:00:24,400More.
If a person says "Words more words." at the same time as another person says "More.", writingWords more words. More
would be wrong. Put:
-Words more words.-More.
Putting-Words more words. -More
is also wrong, it needs the line break, two lines are the maximum most of the time, but in the past people had used three.
If there is enough time to show each on its own block that should be done:Words more words.
then after:More.
.
(subtitleedit will make the block red if it is too short, then you will know you need to join them, I think by default it is set to 25 characters per second but I would use 20)
Anyway each new person or thing making sound gets a dash: e.g. a baby cries then alarm goes off:
-[baby cries]-[alarm goes off]
However if these are far apart, then these should be their own blocks:[baby crying]
then after:[alarm goes off]
(Notice how I put "ing", I did this just in case it goes longer. When Wladek92 translated this page to French, they didn't put the "ing". So, maybe it is different in that language and Wladek92 should change it to what French captions use when sounds go longer).
After 43 characters the line should be split, don't split between names. Try to split between commas or full stops. Do this:
This line is very okay.See I broke the line.
Not this:
This line is very okay. SeeI broke the line.
Because the word "See" is right after a full stop and it looks bad.
Don't do this:
I know little information about TaylorSwift to be a Swifty.
because it splits her name.
Do this:
I know little information aboutTaylor Swift to be a Swifty.
That is good because it does not split her name.
But not this:
I know too little information about Taylor Swiftto be a Swifty
Because that is longer then 43 characters in one line so it is too long.When splitting block because they are too long. Please don't make a block with only one or two words at the end of a sentence: Don't do this:[2]
You can pick your favorite style. As long as you keep the style consistent for the whole video and match other peoples style when editing other people’s closed captions.
It is more popular to use square brackets: (A citation is needed here)
[wolf-whistles]
-[speaker1] Words.-[speaker2] Words.
This style always double dashes when there is two people per block.
-Words-[speaker2] words
-[speaker1] Words-Words
NAME: can be used for this style when identifying a speaker when using this style:
-(EXCLAIMS) Not the claws! Please.-SPEAKER: Holding them back. (sorry this isn't Wikiquote I gotta stop with the quotes to be honest)
SPEAKER 1: Words.SPEAKER 2: Words.
See how I didn't double dash for having two speakers per block? When writing in this style you don't double dash unless it's like this:
-Okay, Sia.-SASHA: Wait.
or this:
-QUNNI: Okay, Sia.-Wait.
Not unless you do want to use this style and always double dash when there is two speakers in a block. Then sure. But I think that choice is less common.
I don't think I've ever seen this used. I have basically made this up:
-(DIAPERING GUY) Hel---(POMNI) What?
(-- is used when someone is cut off.)
-(DIAPERING GUY) Hel---What?
-Hel---(POMNI) What?
In the past people on YouTube used asterisk:
*laughs*
Don't do that. I think today they use round brackets for sounds and square brackets for speakers. Really stupid, who does it like that?[3][4] Don't listen to them that's stupid just use one set of brackets.
Some put spaces:
( sing-song )
With square:
[ sing-song ]
Some put the first letter uppercase:
[ Gasps ]
Some put the first letter uppercase without spaces:
[Gasps]
Some put the first letter uppercase with round brackets:
(Gasps)
Some people put the first letter of each word uppercase:
[ All Laughing ]
(or with round, or without spaces, blah, blah, blah)
Some people always double dash with a space after the dash:
- [Speaker 1] Words.- [Speaker 2] Words.
See how that looks different too:
-[Speaker 1] Words.-[Speaker 2] Words.
Some people double dash only on the second line:[5]
[Speaker 1] Words.-[Speaker 2] Words.
Words.-[Speaker 2] Words.
[Speaker 1] Words.-Words.
Words.-Words.
But this is less common and I would advise against it.
Some prefer to double dash only on the second line with a space: (not including an example because this will get too long)
Some people put a new dash every time the speaker changes. Either with a space which is more common: "- " or without and just a dash: "-". This style is popular on YouTube, I've only seen it out of YouTube inVictorious (at least the captions that Netflix has, not sure about other places.) This is what it would look like:
100:00:20,000 --> 00:00:23,000- Speaker 1- Speaker 2200:00:23,000 --> 00:00:26,000Speaker 2- Speaker 1300:00:26,000 --> 00:00:29,000- Speaker 3- Speaker 1400:00:29,000 --> 00:00:32,000Speaker 1500:00:32,000 --> 00:00:36,000- Speaker 2
If you really hate yourself you can use >> for when the speaker changes. Why?Use what you want, but I personally find dashing for each new speaker confusing because are you meant to dash for sounds? I don't know.
On TV and they use colourful captions. White then yellow then cyan then green.[6]But Commons doesn't support this so you can't use it.
The problem with putting the name in uppercase with a colon at the end, is you still may need to write things in brackets after the name:
MAN (on TV): Spell Okay Correctly Moment!WOMAN: Okay, not ok.
EtherMAN (on TV): Words
or a worse looking way:MAN: (on TV) Words
So forget the uppercase name, right? And just use the squire brackets for both speakers and sounds:
[man on TV] OK or O.K. is also okay.
When using the square brackets for speakers and sounds you always double dash when there are two people per line.
-[man on TV] But okay is better.-[girl] Because OK is like "How are U"
Also never do this:SPEAKER: (YAWNS) I've seen it, it's dumb.[7]
I would recommend using the square brackets to indicate sounds and speakers and not putting a new dash when each speaker change because if a dog started barking I wouldn't know if that could count as a new speaker or now.Use whatever style you want. Except the *asterisks*, that would be stupid. If you edit someone else's closed captions you gotta match their style. And if it isn't your preference don't bother changing it because if it ain’t broke, don't fix it.In the case ofFriends captioned by the media access group at the WGBH education foundation, they always used italics(clicking )
[8] but today some people use the italics to also mean if the sound is not there, e.g. over the phone if someone sighs. You could put[sighs] in italics. Along with their words.
The sound description does not need to last until the whole sound is finished, because keeping blocks on screen for a really long time is really annoying. Clear the screen every now and then (like a screen change or after 8 seconds) and if the sound is still happening you can put [crying continues].
Here is a list of sounds with examples of when they should be used:
[gasps][chuckles][chuckling] Remember "ing" means it goes longer.[laughs][laughing] Used if its longer.[laughter] Can be used if lots of people laugh but [all laugh] and [all laughing] should be used if they start laughing more noticeably.[w:wolf-whistles] "ing" for longer: [wolf-whistling] This "ing" thing applys to almost everything.[groans][hawking] that sounds you make before you spit.(this list could probably go on forever. Just watch TV and learn)
(To do: make this list longer)
If someone speaks but you can't tell who it is without the sound, you put their name:[Elsa] ♪I can't! ♪
Name forcing is mentioning the name before it is mentioned in the dialogue. Anyway if you try to avoid name forcing you can use their gender or occupation, e.g [man] or [waiter] before the words. You can also summarize their name e.gCaptainRaymond Holt toRay orHolt, anything to get the subtitles read in time.
Sometimes the dialogue can make it clear who is talking and putting the name might not be needed. Like this part of the movie:[9]they don't put the names because the dialogue makes it clear who's speaking (but I would still put the names in that scenario). But on that YouTube video the captions are different to the ones on Disney+ and they do put the name even though the dash could also make it obvious the speaker changed.
If you do add the character’s name, don't add it for every caption block, just the first one when the character starts talking. Like this:
[Name]words words words
or[Name over phone]words words words
if its over the phone.Next block doesn't have the name of the person:
more words here
A word with a different tone of voice should be in italics especially if they are implying something. FollowWP:Italics to know what types of media to put in italics.If you need to use italics but the character is off screen and there words are already in italics just put what you wanted to put in italics in non-italics (reversed).
Read the note I left onthis image. Make the last point ? , . ! also in italics unlike that photo.
If you can't read it in time then there is no point. You should remove some words if they speak too fast. Try to never go over 25 characters per second and aim under 20. (So maybe only go over 20 if important). (not unless I'm just old fashioned and 22 is better). Please never go over 17 for kids shows because they read slower. Subtitleedit will tell you the CPS. When paraphrasing you must maintain the original meaning. And try not to change the sentence in a way that adds or removes a question mark. And when removing words don't use acronyms e.g replacing "Oh my God" with "OMG" to paraphrase. Also if the sentence is at 20 CPS and you paraphrased it down to 7, that sounds like you removed way too many words, don't do that. If you don't want to paraphrase as much, and you know they stop speaking fast and take a 5 second break after their sentence, you can spread the blocks so they are delayed and cut into this 5 second gap. Just like you can delay them, you can also make them show up a second before. But delaying blocks or showing them early to keep then onscreen longer is old fashioned. So maybe don't go overkill using this method. (Not unless you need to go overkill because you don't wanna cut out information) I try to delay some but if it's too much I'll also paraphrase.
Also I don't know where to put this but "and" should often be at the start of a block and almost never at the end. Not unless they say "and..." then "more words" else (but you could still put "and..." "more words") when they say "more words"
Because all caps is used for things, you shouldn't use it for screaming. But I have seenFriends use italics for a whole paragraph toreally emphasize the yelling.[10] so thats an option, but I don't see it used anymore.
Use all caps for when someone says anacronym e.gNASA or aninitialism e.gADHD, although you won't be able to tell if they said each letter (initialism) or not (acronym).
Friends used to combat this problem by putting dots though all initialisms,[11]
but this isn't done that much anymore.
Made up words that originated form an acronym should also be in caps, e.g inHeartbreak High they go to a class called Sexual Literacy Tutorial but the student's call it "sluts".
The subtitles are written as "I'm going to SLT's" even if they say "sluts".
I disagree and would make it say "I'm going to SLUTS." you pick.
If someone says "I paid thirteen dollars" put "I paid $13" if they say "It cost me ten grand but if I asked the other guy it would've been twenty" you can putIt cost me $10,000 but if I asked the other guy it would've been $20,000.
orIt cost me $10,000 but if I asked the other guy it would've been 20.
– not sure what's better.
If a character is talking about Wikipedia and they say "W P colon mos" putWP:MOS
as that's how you type it.But it gets tricky if they say "slash". In the case of theBrooklyn Nine-Nine Halloween episodes they say "I will be the best detective slash genius." but they subtitled it as "I will be the best detective/genius." even though they said "slash". The later seasons are subtitled differently and include the word "slash" in the subtitles if they said it, the newer method is recommended.
Lyrics the character is singing should be surrounded by the ♪ character, Unicode U+9834, or Alt+266A. You can also use ♫ Unicode U+9835, or Alt+266B, e.g.
1 00:00:20,000 --> 00:00:24,400♪ Take me out to the ball game ♪
(First letter uppercase and no full stop)
You should stop subtitling the music when the characters start talking over it, so the audience can read what the characters are saying. Also don't put a full stop at the end of each line, I've seen that before on ABC iview, that oldHeartbreak High episode where they wanted a common room. Singing isn't a sentence, probably why each first letter is in uppercase and why lots of people put it in italics.
The only supported markup of the SRT format is
REMINDER: Wikicode formatting is not supported.
After the subtitles have been transcribed in the original language of the video onto a Timed Text file, they can be translated into other languages as follows:
These are articles abouteitherQ844253: Timed text, orQ204028: subtitle.
This section needs expansion.
How to associate closed captions with multimedia files?
A possible categorization scheme is:
[[:Category:File formats]] + [[:Category:Media types]] | [[:Category:Timed Text]] + [[:Category:Legend in German]] | [[:Category:Timed Text in German]] + [[:Category:Legend in French]] | [[:Category:Timed Text in French]] + [[:Category:Legend in English]] | [[:Category:Timed Text in English]]
Related categories:Category:Files with closed captioning