YouTube transcripts

YouTube is an online video sharing and social media platform created by Google.

This notebook covers how to load documents fromYouTube transcripts.

from langchain_community.document_loadersimport YoutubeLoader

API Reference:YoutubeLoader

%pip install--upgrade--quiet  youtube-transcript-api

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=False
)

loader.load()

Add video info

%pip install--upgrade--quiet  pytube

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=True
)
loader.load()

Add language preferences

Language param : It's a list of language codes in a descending priority,en by default.

translation param : It's a translate preference, you can translate available transcript to your preferred language.

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg",
    add_video_info=True,
    language=["en","id"],
    translation="en",
)
loader.load()

Get transcripts as timestamped chunks

Get one or moreDocument objects, each containing a chunk of the video transcript. The length of the chunks, in seconds, may be specified. Each chunk's metadata includes a URL of the video on YouTube, which will start the video at the beginning of the specific chunk.

transcript_format param: One of thelangchain_community.document_loaders.youtube.TranscriptFormat values. In this case,TranscriptFormat.CHUNKS.

chunk_size_seconds param: An integer number of video seconds to be represented by each chunk of transcript data. Default is 120 seconds.

from langchain_community.document_loaders.youtubeimport TranscriptFormat

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=TKCMw0utiak",
    add_video_info=True,
    transcript_format=TranscriptFormat.CHUNKS,
    chunk_size_seconds=30,
)
print("\n\n".join(map(repr, loader.load())))

API Reference:TranscriptFormat

YouTube loader from Google Cloud

Prerequisites

Create a Google Cloud project or use an existing project
Enable theYoutube Api
Authorize credentials for desktop app
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib youtube-transcript-api

🧑 Instructions for ingesting your Google Docs data

By default, theGoogleDriveLoader expects thecredentials.json file to be~/.credentials/credentials.json, but this is configurable using thecredentials_file keyword argument. Same thing withtoken.json. Note thattoken.json will be created automatically the first time you use the loader.

GoogleApiYoutubeLoader can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:Note depending on your set up, theservice_account_path needs to be set up. Seehere for more details.

# Init the GoogleApiClient
from pathlibimport Path

from langchain_community.document_loadersimport GoogleApiClient, GoogleApiYoutubeLoader

google_api_client= GoogleApiClient(credentials_path=Path("your_path_creds.json"))


# Use a Channel
youtube_loader_channel= GoogleApiYoutubeLoader(
    google_api_client=google_api_client,
    channel_name="Reducible",
    captions_language="en",
)

# Use Youtube Ids

youtube_loader_ids= GoogleApiYoutubeLoader(
    google_api_client=google_api_client, video_ids=["TrdevFK_am4"], add_video_info=True
)

# returns a list of Documents
youtube_loader_channel.load()

API Reference:GoogleApiClient |GoogleApiYoutubeLoader

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

YouTube transcripts

Add video info

Add language preferences

Get transcripts as timestamped chunks

YouTube loader from Google Cloud

Prerequisites

🧑 Instructions for ingesting your Google Docs data

Related

Movatterモバイル変換

Add video info​

Add language preferences​

Get transcripts as timestamped chunks​

YouTube loader from Google Cloud​

Prerequisites​

🧑 Instructions for ingesting your Google Docs data​

Related​

Add video info

Add language preferences

Get transcripts as timestamped chunks

YouTube loader from Google Cloud

Prerequisites

🧑 Instructions for ingesting your Google Docs data

Related