- Notifications
You must be signed in to change notification settings - Fork19
Tentacule/PgsToSrt
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
ConvertPGS subtitles toSRT usingOCR.
Data files must be placed in thetessdata
folder inside PgsToSrt folder, or the path can be specified in the command line with the --tesseractdata parameter.
You only need data files for the language(s) you want to convert.
dotnet PgsToSrt.dll [parameters]
Parameter | Description |
---|---|
--input | Input filename, can be an mkv file or pgs subtitle extracted to a .sup file with mkvextract. |
--output | Output SubRip (.srt ) filename. Auto generated from input filename if not set. |
--track | Track number of the subtitle to process in an.mkv file (only required when input is a matroska file)This can be obtained with mkvinfo |
--tracklanguage | Convert all tracks of the specified language (only works with.mkv input) |
--tesseractlanguage | Tesseract language to use if multiple languages are available in the tesseract data directory. |
--tesseractdata | Path of tesseract language data files, by defaulttessdata in the executable directory. |
--tesseractversion | libtesseract version, support 4 and 5 (default: 4) (ignored on Windows platform) |
--libleptname | leptonica library name, usually lept or leptonica, 'lib' prefix is automatically added (default: lept) (ignored on Windows platform) |
--libleptversion | leptonica library version (default: 5) (ignored on Windows platform) |
dotnet PgsToSrt.dll --input video1.fr.sup --output video1.fr.srt --tesseractlanguage fradotnet PgsToSrt.dll --input video1.mkv --output video1.srt --track 4
Examimeentrypoint.sh
for a full list of all available arguments.
docker run -it --rm \ -v /data:/data \ -e INPUT=/data/myImageSubtitle.sup \ -e OUTPUT=/data/myTextSubtitle.srt \ -e LANGUAGE=eng \ tentacule/pgstosrt
Hint: The default arguments coming fromDockerfile
areINPUT=/input.sup
andOUTPUT=/output.srt
, so you can easily:
touch output-file.srt# This needs to be a file, otherwise Docker will just assume it's a directory mount and it will fail.docker run --it -rm \ -v source-file.sup:/input.sup \ -v output-file.srt:/output.srt \ -e LANGUAGE=eng \ tentacule/pgstosrt
- Windows : none, tesseract/leptonica libraries are included in the release package.
- Linux : libtesseract5 (
sudo apt install libtesseract5
or whatever your distro requires)
To build PgsToSrt.dll execute the following commands in thesrc/
directory:
dotnet restoredotnet publish -c Release -o out --framework net6.0# The file produced is PgsToSrt/out/PgsToSrt.dll
To build a Docker image for all languages:
make build-all
To build a docker image for a single language:
make build-single LANGUAGE=eng# or any other Tessaract-available language code