From the brand

Explore more Data Science
Start learning with O'Reilly
More From O'Reilly
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher

What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis.
New for the Second Edition
The first edition of this book was published in 2012, during a time when open source data analysis libraries for Python (such as pandas) were very new and developing rapidly. In this updated and expanded second edition, I have overhauled the chapters to account both for incompatible changes and deprecations as well as new features that have occurred in the last five years.
I’ve also added fresh content to introduce tools that either did not exist in 2012 or had not matured enough to make the first cut. Finally, I have tried to avoid writing about new or cutting-edge open source projects that may not have had a chance to mature. I would like readers of this edition to find that the content is still almost as relevant in 2020 or 2021 as it is in 2017.
The major updates in this second edition include:
- All code, including the Python tutorial, updated for Python 3.6 (the first edition used Python 2.7)
- Updated Python install instructions for the Anaconda Python Distribution & other Python packages
- Updates for the latest versions of the pandas library in 2017
- A new chapter on some more advanced pandas tools, and some other usage tips
- A brief introduction to using statsmodels and scikit-learn
- Reorganized since from the first edition to make the book more accessible to newcomers.
Editorial Reviews
About the Author
Wes McKinney is a New York?based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.
Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.
Product details
- Publisher : O'Reilly Media; 2nd edition (October 24, 2017)
- Language : English
- Paperback : 550 pages
- ISBN-10 : 1491957662
- ISBN-13 : 978-1491957660
- Item Weight : 1.91 pounds
- Dimensions : 7 x 1.11 x 9.19 inches
- Best Sellers Rank: #281,923 in Books (See Top 100 in Books)
- #66 inData Modeling & Design (Books)
- #124 inData Processing
- #238 inPython Programming
- Customer Reviews:
About the author

Since 2007, I have been creating fast, easy-to-use data wrangling and statistical computing tools, mostly in the Python programming language. I am best known for creating the pandas project and writing the book Python for Data Analysis. I am also a contributor to the Apache Arrow, Kudu, and Parquet projects within the Apache Software Foundation. I am currently the CTO and Co-founder of Voltron Data, which builds accelerated computing technologies powered by Apache Arrow. I previously worked for Ursa Labs (within RStudio / Posit), Two Sigma, Cloudera, DataPad, and AQR Capital Management.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find this book to be an excellent introduction to Python for data analysis, with comprehensive content and practical examples that work as advertised. They appreciate its solid material coverage and value for money. The writing style receives mixed feedback, with some customers praising it while others find it difficult to follow.
AI-generated from the text of customer reviews
Select to learn more
Customers find the book provides a great introduction to Python for data analysis, with one customer noting it serves as excellent preparation for Machine Learning.
"...and because while McKinney is not fun to read, he doespack the book with useful information and it is (mostly) well organized...."Read more
"...This book primarilyfocuses on the pandas Python library, which is awesome at processing and organizing data..."Read more
"This bookcovers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but..."Read more
"This book gave me my first job. And I am still learning it.It is simple, talks some general idea why functions design like this, and introduces..."Read more
Customers find the book excellent and worth buying, with one mentioning it's worth 60 box.
"...This book has beenwell worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey...."Read more
"Great book, though a bit dry and slow...."Read more
"...There are plenty of code examples. Soworth the purchase. Only negative I wish there were mini projects to learn from."Read more
"Overall,I liked the book. Between the first and second edition, however, the author reorganized the book that made it harder for me to absorb...."Read more
Customers find the book very useful, with examples that work as advertised and advanced functionality. One customer specifically mentions how it helps apply Python's tools to data science, while another notes its value for engine performance engineers.
"...This book has significantlyimproved how I work. Thanks, Wes and team."Read more
"...some general idea why functions design like this, andintroduces some practical functions...."Read more
"...of this book, and I think it accomplished it's goal of being agood general resource for people beginning their career/learning with Python and data..."Read more
"...Its agreat book to have as a reference and learning data analysis techniques. There are plenty of code examples. So worth the purchase...."Read more
Customers find the content of the book great, with one customer particularly noting the good chapters on handling time series.
"So far, this book has been aninspiring reading. It contains a huge number of data cleansing, transformation, analysis & etc. code snippets...."Read more
"Great content. Five star content. But, pages started coming off the binding one day after I got this in the mail...."Read more
"Thecontent is too generic, hope it can be more technological."Read more
"Excellent step-by-step instructions.Interesting examples."Read more
Customers find the book solid, with good material coverage, and one customer specifically mentions its high-quality paper.
"...copious use of code snippets to illustrate his points makes thematerial very usable...."Read more
"This product arrived fast. Thebook was in great shape. Couldn't have asked for a better buying experience"Read more
"I think thisbook is solid but it was a bit beyond my level...."Read more
"...details about the Pandas library for Python, the author alsoincludes solid sections about the python language and NumPy...."Read more
Customers have mixed opinions about the writing style of the book, with some finding it great while others note that it reads like a dictionary with artificial datasets.
"Well written by the creator of Pandas. The author's copious use of code snippets to illustrate his points makes the material very usable...."Read more
"...and rewarding in its use of example datasets, itsmore personable writing style, and its outlining of good practices for data science."Read more
"...andwrites like an impatient person who would rather be doing something else...."Read more
"...bit as detailed as I hoped it would be with a great introduction,great examples, and great coverage of fundamental, basic, and advanced..."Read more

Images in this review
Reviews with images

Poor quality binding but great content.
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on November 24, 2021I got this book when I was transitioning to doing data science with Python and was struggling to become familiar with standard tools. It's written by the creator of Pandas, and follows the style of the Pandas documentation: dense, telegraphic, peppered with examples.
It's hard work because Wes McKinney often does not articulate why you would need to do something (assuming you are already knowledgeable on the underlying process), and writes like an impatient person who would rather be doing something else. Additionally examples often suffer from being both too long and too short - too long in that almost every example is on a toy dataset created from scratch, too short in that most of those datasets have only 5 or 10 elements and do not always showcase complex operations. Other examples (particularly involving time series) have an overabundance of data that make the critical results hard to spot. Frankly, my first month with Pandas was a miserable one.
But I give the book 5 stars both because I came to love Pandas as I got more familiar with it, and because while McKinney is not fun to read, he does pack the book with useful information and it is (mostly) well organized. If anything it would benefit from being longer and with a more patient treatment of larger and more concrete datasets (eg the Titanic passenger dataset used in the Pandas documentation). The initial chapter on the basics of using Python could go - if you need this book, then you don't want to be trying to learn the rudiments of Python from it. If you can accept that you'll need a lot of bookmarks or margin notes to get through a rather steep learning curve, it will reward your persistence. - Reviewed in the United States on April 5, 2019This book has been my foundation of using python as a data analyst.
This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data (Python pandas is like MS Excel times 100. This is not an exaggeration). It also introduces the reader into numpy (lower level number crunching and arrays), matplotlib (data visualizations), scikitlearn (machine learning), and other useful data science libraries. The book contains other book recommendations for continuing education.
Although this would be a challenging book for a brand new Python user, I would still recommend it, especially if you are currently doing a lot of work in MS Excel and/ or exporting data from databases. I had a few false starts learning Python, and my biggest stumbling block was lack of application in what I was learning. This book puts practical tools in the reader's hands very quickly. I personally don't have time to make goofy games etc. that other books have used as practice examples. Despite other reviews criticizing the use of random data throughout the book, I found the examples easy to follow and useful. I would also argue that learning how to generate random data is useful in itself (thus the purpose of the numpy random library), and that there are practical examples throughout the book. Chapter 14 devoted to real-world data analysis examples.
I am almost finished with my second time through the book, this time working through every example. This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey. This book has significantly improved how I work.
Thanks, Wes and team. - Reviewed in the United States on January 26, 2019This book covers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but it doesn't really offer compelling real-world examples. The data seem to be made up and the analyses don't go into enough detail to help you really learn how pandas and numpy work. Overall this is a decent starter book but you will have to bookmark the python and pandas documentation online if you want to have a reference to all of the functionality those tools have, and there are many places online where you can get better examples to learn from. If you haven't made your mind up about which tool to use for data analysis, I highly recommend checking out dplyr in R, which has an excellent free book online (R for data science, hadley wickham). I find it very easy to learn and it is much easier to set up R and RStudio than it is to set up Python, even though I love Python and Pandas.
- Reviewed in the United States on December 6, 2017This book gave me my first job. And I am still learning it. It is simple, talks some general idea why functions design like this, and introduces some practical functions. Because in real life real job you always need to look up documentation or to google certain functions, I think the idea why Wes makes functions/variables like this, and what he wants to develop in the future is very important. anyway, I think this book is for data analysis beginner and some intermediate users. I learned Python first so I recommend beginners who want to use Python for Data Analyst/Scientist to learn Python Programming first/simultaneously. At least understand lambda and python expressions, otherwise, you can't feel the full magic.
- Reviewed in the United States on February 14, 2022Well written by the creator of Pandas. The author's copious use of code snippets to illustrate his points makes the material very usable. The snippets are short enough to type by hand so you get the frequent opportunity to play with the code and really understand the tools being presented. And Pandas is awesome!
- Reviewed in the United States on July 21, 2019So far, this book has been an inspiring reading. It contains a huge number of data cleansing, transformation, analysis & etc. code snippets. The code is very clean and - for the most part - self-explaining (at least, for a seasoned software developer). The book step by step displays the motivations behind the design and functionality of center-piece Python modules - and you would not expect anything less from the original designer of Pandas. I feel this wonderful book being a natural extension of ageless Practical CS classics by Niklaus Wirth, Kernighan-Ritchie, and B. Stroustrup for Data Science Age.
- Reviewed in the United States on August 16, 2024This product arrived fast. The book was in great shape. Couldn't have asked for a better buying experience
Top reviews from other countries
- abhishek patilReviewed in India on April 8, 2021
5.0 out of 5 starsGood packing and perfect book for fresher data analytics profesional
Good packing from the party .I just loved the overall structure of the book and its content.I will say must for all fresher wishing to get into data analytics and data science!! - Jovial GBA-GOMBOReviewed in France on May 20, 2022
5.0 out of 5 starsRapide et sûr
Acquisition pour un perfectionnement en tant que Data Analyst - TamerReviewed in the United Arab Emirates on December 18, 2019
5.0 out of 5 starsBest pandas reference book
This is the best reference I use for dealing with python, numpy and mainly pandas. Must have for anyone learning or using pandas. The author (who actually wrote pandas)style is into the point, clear and with simple examples that demonstrate the usage in real world.
Also this book has all the info to help you prepare data for sci-kit learn and tf . - Conrad T.Reviewed in Australia on December 27, 2018
4.0 out of 5 starsOh lovely dictionaries and tuple
Great book for anyone needing common tools used in Python and indeed in data science. My favourite chapters were 2, 3 and 4. Basically, if you are a noob like me, you should start from the chapter one and read through till chapter 4. List, dictionary, tuple comprehensions are quiet powerful as well as array with panadas and scikitlearn.
If you are looking for more in depth graphical representation of plots using pandas and skitlearn then maybe look at another book as this one is more of a back ground tools kind-a-thing.Great book for anyone needing common tools used in Python and indeed in data science. My favourite chapters were 2, 3 and 4. Basically, if you are a noob like me, you should start from the chapter one and read through till chapter 4. List, dictionary, tuple comprehensions are quiet powerful as well as array with panadas and scikitlearn.4.0 out of 5 starsConrad T.Oh lovely dictionaries and tuple
Reviewed in Australia on December 27, 2018
If you are looking for more in depth graphical representation of plots using pandas and skitlearn then maybe look at another book as this one is more of a back ground tools kind-a-thing.Images in this review
- Pedro DiasReviewed in Spain on January 8, 2021
5.0 out of 5 starsMust have
You must have this book if you want to learn Pandas and Data Science.