Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

YouTube transcripts

YouTube is an online video sharing and social media platform created by Google.

This notebook covers how to load documents fromYouTube transcripts.

from langchain_community.document_loadersimport YoutubeLoader
API Reference:YoutubeLoader
%pip install--upgrade--quiet  youtube-transcript-api
loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=False
)
loader.load()

Add video info

%pip install--upgrade--quiet  pytube
loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=True
)
loader.load()

Add language preferences

Language param : It's a list of language codes in a descending priority,en by default.

translation param : It's a translate preference, you can translate available transcript to your preferred language.

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg",
add_video_info=True,
language=["en","id"],
translation="en",
)
loader.load()

Get transcripts as timestamped chunks

Get one or moreDocument objects, each containing a chunk of the video transcript. The length of the chunks, in seconds, may be specified. Each chunk's metadata includes a URL of the video on YouTube, which will start the video at the beginning of the specific chunk.

transcript_format param: One of thelangchain_community.document_loaders.youtube.TranscriptFormat values. In this case,TranscriptFormat.CHUNKS.

chunk_size_seconds param: An integer number of video seconds to be represented by each chunk of transcript data. Default is 120 seconds.

from langchain_community.document_loaders.youtubeimport TranscriptFormat

loader= YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=TKCMw0utiak",
add_video_info=True,
transcript_format=TranscriptFormat.CHUNKS,
chunk_size_seconds=30,
)
print("\n\n".join(map(repr, loader.load())))
API Reference:TranscriptFormat

YouTube loader from Google Cloud

Prerequisites

  1. Create a Google Cloud project or use an existing project
  2. Enable theYoutube Api
  3. Authorize credentials for desktop app
  4. pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib youtube-transcript-api

🧑 Instructions for ingesting your Google Docs data

By default, theGoogleDriveLoader expects thecredentials.json file to be~/.credentials/credentials.json, but this is configurable using thecredentials_file keyword argument. Same thing withtoken.json. Note thattoken.json will be created automatically the first time you use the loader.

GoogleApiYoutubeLoader can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:Note depending on your set up, theservice_account_path needs to be set up. Seehere for more details.

# Init the GoogleApiClient
from pathlibimport Path

from langchain_community.document_loadersimport GoogleApiClient, GoogleApiYoutubeLoader

google_api_client= GoogleApiClient(credentials_path=Path("your_path_creds.json"))


# Use a Channel
youtube_loader_channel= GoogleApiYoutubeLoader(
google_api_client=google_api_client,
channel_name="Reducible",
captions_language="en",
)

# Use Youtube Ids

youtube_loader_ids= GoogleApiYoutubeLoader(
google_api_client=google_api_client, video_ids=["TrdevFK_am4"], add_video_info=True
)

# returns a list of Documents
youtube_loader_channel.load()

Related


[8]ページ先頭

©2009-2025 Movatter.jp