Instantly share code, notes, and snippets.
Last activeJune 17, 2018 12:31
Save lsloan/1327534 to your computer and use it in GitHub Desktop.
A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Gist title: "Convert video subtitles from YouTube XML format to SubRip (.srt)" | |
Summary: A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
# Convert XML YouTube subtitles to SubRip (srt) format | |
# To download the subtitle in XML, put the ID of the YouTube video | |
# at the end of the url: | |
# | |
# http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__ | |
# Usage: | |
# | |
# $ ruby youtube2srt.rb [input_filename] [output_filename] | |
# | |
# Where input_filename can be either the name of your xml file | |
# (probably timedtext.xml) or the hashid of your YouTube video. | |
# The output filename is optional. | |
require'rubygems' | |
require'nokogiri' | |
require'uri' | |
require'net/http' | |
BASE_URL='http://video.google.com/timedtext?hl=en&lang=en&v=' | |
source_filename=ARGV[0] | |
output_filename=ARGV[1] | |
TIME_FORMAT='%02H:%02M:%02S,%3N' | |
defcreate_srtoutput_filename,source_filename,source_file | |
File.open(output_filename ||source_filename.gsub('.xml','').concat('.srt'),'w+')do |srt_file| | |
source_file.css('text').to_enum.with_index(1)do |sub,i| | |
start_time=Time.at(sub['start'].to_f).utc | |
end_time=start_time +sub['dur'].to_f | |
srt_file.write(<<~CAPTION | |
#{i} | |
#{start_time.strftime(TIME_FORMAT)} -->#{end_time.strftime(TIME_FORMAT)} | |
#{Nokogiri::HTML.parse(sub.text).text} | |
CAPTION | |
) | |
end | |
end | |
end | |
ifsource_filename =~/\.xml$/i | |
source_file=Nokogiri::XML(open(source_filename), &:noblanks) | |
create_srt(output_filename,source_filename,source_file) | |
puts"xml file#{source_filename} converted to srt" | |
else | |
response=Net::HTTP.get_responseURI.parse(BASE_URL+source_filename) | |
ifresponse.code_type.ancestors.include?(Net::HTTPSuccess) | |
source_file=Nokogiri::XML(response.body, &:noblanks) | |
create_srt(output_filename,source_filename,source_file) | |
puts'Google timedtext.xml converted to srt' | |
else | |
puts"Couldn't find a srt file for#{source_filename} at#{BASE_URL +source_filename}" | |
end | |
end |
This is notbad, although I'd rather see it done with XSL or Python.
StanBoyet commentedSep 5, 2013
Well thank you :)
viliam-durina commentedJul 16, 2015 • edited by lsloan
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by lsloan
Uh oh!
There was an error while loading.Please reload this page.
I just found out that you can simply use this URL:
https://www.youtube.com/api/timedtext?fmt=srt&v=YOUR_VIDEO_CODE&lang=YOUR_LANG_CODE
forgijs commentedMar 3, 2016 • edited by lsloan
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by lsloan
Uh oh!
There was an error while loading.Please reload this page.
fmt=srt
doesn't work anymore, unfortunately.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment