GitHub

This notebooks shows how you can load issues and pull requests (PRs) for a given repository onGitHub. Also shows how you can load github files for a given repository onGitHub. We will use the LangChain Python repository as an example.

Setup access token

To access the GitHub API, you need a personal access token - you can set up yours here:https://github.com/settings/tokens?type=beta. You can either set this token as the environment variableGITHUB_PERSONAL_ACCESS_TOKEN and it will be automatically pulled in, or you can pass it in directly at initialization as theaccess_token named parameter.

# If you haven't set your access token as an environment variable, pass it in here.
from getpassimport getpass

ACCESS_TOKEN= getpass()

Load Issues and PRs

from langchain_community.document_loadersimport GitHubIssuesLoader

API Reference:GitHubIssuesLoader

loader= GitHubIssuesLoader(
    repo="langchain-ai/langchain",
    access_token=ACCESS_TOKEN,# delete/comment out this argument if you've set the access token as an env var.
    creator="UmerHA",
)

Let's load all issues and PRs created by "UmerHA".

Here's a list of all filters you can use:

include_prs
milestone
state
assignee
creator
mentioned
labels
sort
direction
since

For more info, seehttps://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues.

docs= loader.load()

print(docs[0].page_content)
print(docs[0].metadata)

Only load issues

By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), useinclude_prs=False

loader= GitHubIssuesLoader(
    repo="langchain-ai/langchain",
    access_token=ACCESS_TOKEN,# delete/comment out this argument if you've set the access token as an env var.
    creator="UmerHA",
    include_prs=False,
)
docs= loader.load()

print(docs[0].page_content)
print(docs[0].metadata)

Load Github File Content

For below code, loads all markdown file in rpeolangchain-ai/langchain

from langchain_community.document_loadersimport GithubFileLoader

API Reference:GithubFileLoader

loader= GithubFileLoader(
    repo="langchain-ai/langchain",# the repo name
    branch="master",# the branch name
    access_token=ACCESS_TOKEN,
    github_api_url="https://api.github.com",
    file_filter=lambda file_path: file_path.endswith(
".md"
),# load all markdowns files.
)
documents= loader.load()

example output of one of document:

document.metadata:
{
"path":"README.md",
"sha":"82f1c4ea88ecf8d2dfsfx06a700e84be4",
"source":"https://github.com/langchain-ai/langchain/blob/master/README.md"
}
document.content:
    mock content

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

GitHub

Setup access token

Load Issues and PRs

Only load issues

Load Github File Content

Related

Movatterモバイル変換

Setup access token​

Load Issues and PRs​

Only load issues​

Load Github File Content​

Related​

Setup access token

Load Issues and PRs

Only load issues

Load Github File Content

Related