Contributing to pandas#
All contributions, bug reports, bug fixes, documentation improvements,enhancements, and ideas are welcome.
Bug reports and enhancement requests#
Bug reports and enhancement requests are an important part of making pandas more stable andare curated though Github issues. When reporting an issue or request, please select theappropriatecategory and fill out the issue form fullyto ensure others and the core development team can fully understand the scope of the issue.
The issue will then show up to the pandas community and be open to comments/ideas from others.
Finding an issue to contribute to#
If you are brand new to pandas or open-source development, we recommend searchingtheGitHub “issues” tabto find issues that interest you. Unassigned issues labeledDocsandgood first issueare typically good for newer contributors.
Once you’ve found an interesting issue, it’s a good idea to assign the issue to yourself,so nobody else duplicates the work on it. On the Github issue, a comment with the exacttexttake
to automatically assign you the issue(this will take seconds and may require refreshing the page to see it).
If for whatever reason you are not able to continue working with the issue, pleaseunassign it, so other people know it’s available again. You can check the list ofassigned issues, since people may not be working in them anymore. If you want to work on onethat is assigned, feel free to kindly ask the current assignee if you can take it(please allow at least a week of inactivity before considering work in the issue discontinued).
We have severalcontributor community communication channels, which you arewelcome to join, and ask questions as you figure things out. Among them are regular meetings fornew contributors, dev meetings, a dev mailing list, and a Slack for the contributor community.All pandas contributors are welcome to these spaces, where they can connect with each other. Evenmaintainers who have been with us for a long time felt just like you when they started out, andare happy to welcome you and support you as you get to know how we work, and where things are.Take a look at the next sections to learn more.
Submitting a pull request#
Version control, Git, and GitHub#
pandas is hosted onGitHub, and tocontribute, you will need to sign up for afree GitHub account. We useGit forversion control to allow many people to work together on the project.
If you are new to Git, you can reference some of these resources for learning Git. Feel free to reach outto thecontributor community for help if needed:
Also, the project follows a forking workflow further described on this page wherebycontributors fork the repository, make changes and then create a pull request.So please be sure to read and follow all the instructions in this guide.
If you are new to contributing to projects through forking on GitHub,take a look at theGitHub documentation for contributing to projects.GitHub provides a quick tutorial using a test repository that may help you become more familiarwith forking a repository, cloning a fork, creating a feature branch, pushing changes andmaking pull requests.
Below are some useful resources for learning more about forking and pull requests on GitHub:
Getting started with Git#
GitHub has instructions for installing git,setting up your SSH key, and configuring git. All these steps need to be completed beforeyou can work seamlessly between your local repository and GitHub.
Create a fork of pandas#
You will need your own copy of pandas (aka fork) to work on the code. Go to thepandas projectpage and hit theFork
button. Please uncheck the box to copy only the main branch before selectingCreateFork
.You will want to clone your fork to your machine
gitclonehttps://github.com/your-user-name/pandas.gitpandas-yournamecdpandas-yournamegitremoteaddupstreamhttps://github.com/pandas-dev/pandas.gitgitfetchupstream
This creates the directorypandas-yourname
and connects your repository tothe upstream (main project)pandas repository.
Note
Performing a shallow clone (with--depth==N
, for someN
greateror equal to 1) might break some tests and features aspd.show_versions()
as the version number cannot be computed anymore.
Creating a feature branch#
Your localmain
branch should always reflect the current state of pandas repository.First ensure it’s up-to-date with the main pandas repository.
gitcheckoutmaingitpullupstreammain--ff-only
Then, create a feature branch for making your changes. For example
gitcheckout-bshiny-new-feature
This changes your working branch frommain
to theshiny-new-feature
branch. Keep anychanges in this branch specific to one bug or feature so it is clearwhat the branch brings to pandas. You can have many feature branchesand switch in between them using thegitcheckout
command.
When you want to update the feature branch with changes in main afteryou created the branch, check the section onupdating a PR.
Making code changes#
Before modifying any code, ensure you follow thecontributing environmentguidelines to set up an appropriate development environment.
Then once you have made code changes, you can see all the changes you’ve currently made by running.
gitstatus
For files you intended to modify or add, run.
gitaddpath/to/file-to-be-added-or-changed.py
Runninggitstatus
again should display
Onbranchshiny-new-featuremodified:/relative/path/to/file-to-be-added-or-changed.py
Finally, commit your changes to your local repository with an explanatory commitmessage
gitcommit-m"your commit message goes here"
Pushing your changes#
When you want your changes to appear publicly on your GitHub page, push yourforked feature branch’s commits
gitpushoriginshiny-new-feature
Hereorigin
is the default name given to your remote repository on GitHub.You can see the remote repositories
gitremote-v
If you added the upstream repository as described above you will see somethinglike
origin[email protected]:yourname/pandas.git(fetch)origin[email protected]:yourname/pandas.git(push)upstreamgit://github.com/pandas-dev/pandas.git(fetch)upstreamgit://github.com/pandas-dev/pandas.git(push)
Now your code is on GitHub, but it is not yet a part of the pandas project. For that tohappen, a pull request needs to be submitted on GitHub.
Making a pull request#
One you have finished your code changes, your code change will need to follow thepandas contribution guidelines to be successfully accepted.
If everything looks good, you are ready to make a pull request. A pull request is howcode from your local repository becomes available to the GitHub community to reviewand merged into project to appear the in the next release. To submit a pull request:
Navigate to your repository on GitHub
Click on the
Compare&pullrequest
buttonYou can then click on
Commits
andFilesChanged
to make sure everything looksokay one last timeWrite a descriptive title that includes prefixes. pandas uses a convention for titleprefixes. Here are some common ones along with general guidelines for when to use them:
ENH: Enhancement, new functionality
BUG: Bug fix
DOC: Additions/updates to documentation
TST: Additions/updates to tests
BLD: Updates to the build process/scripts
PERF: Performance improvement
TYP: Type annotations
CLN: Code cleanup
Write a description of your changes in the
PreviewDiscussion
tabClick
SendPullRequest
.
This request then goes to the repository maintainers, and they will reviewthe code.
Updating your pull request#
Based on the review you get on your pull request, you will probably need to makesome changes to the code. You can follow thecode committing stepsagain to address any feedback and update your pull request.
It is also important that updates in the pandasmain
branch are reflected in your pull request.To update your feature branch with changes in the pandasmain
branch, run:
gitcheckoutshiny-new-featuregitfetchupstreamgitmergeupstream/main
If there are no conflicts (or they could be fixed automatically), a file with adefault commit message will open, and you can simply save and quit this file.
If there are merge conflicts, you need to solve those conflicts. See forexample athttps://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/for an explanation on how to do this.
Once the conflicts are resolved, run:
gitadd-u
to stage any files you’ve updated;gitcommit
to finish the merge.
Note
If you have uncommitted changes at the moment you want to update the branch withmain
, you will need tostash
them prior to updating (see thestash docs).This will effectively store your changes and they can be reapplied after updating.
After the feature branch has been update locally, you can now update your pullrequest by pushing to the branch on GitHub:
gitpushoriginshiny-new-feature
Anygitpush
will automatically update your pull request with your branch’s changesand restart theContinuous Integration checks.
Updating the development environment#
It is important to periodically update your localmain
branch with updates from the pandasmain
branch and update your development environment to reflect any changes to the various packages thatare used during development.
If usingconda, run:
gitcheckoutmaingitfetchupstreamgitmergeupstream/maincondaactivatepandas-devcondaenvupdate-fenvironment.yml--prune
If usingpip , do:
gitcheckoutmaingitfetchupstreamgitmergeupstream/main# activate the virtual environment based on your platformpython-mpipinstall--upgrade-rrequirements-dev.txt
Tips for a successful pull request#
If you have made it to theMaking a pull request phase, one of the core contributors maytake a look. Please note however that a handful of people are responsible for reviewingall of the contributions, which can often lead to bottlenecks.
To improve the chances of your pull request being reviewed, you should:
Reference an open issue for non-trivial changes to clarify the PR’s purpose
Ensure you have appropriate tests. These should be the first part of any PR
Keep your pull requests as simple as possible. Larger PRs take longer to review
Ensure that CI is in a green state. Reviewers may not even look otherwise
KeepUpdating your pull request, either by request or every few days