What’s New
Installation
Contributing to pandas
Frequently Asked Questions (FAQ)
Package overview
10 Minutes to pandas
Tutorials
Cookbook
Intro to Data Structures
Essential Basic Functionality
Working with Text Data
Options and Settings
Indexing and Selecting Data
MultiIndex / Advanced Indexing
Computational tools
Working with missing data
Group By: split-apply-combine
Merge, join, and concatenate
Reshaping and Pivot Tables
Time Series / Date functionality
Time Deltas
Categorical Data
Visualization
Style
IO Tools (Text, CSV, HDF5, ...)
Remote Data Access
Enhancing Performance
Sparse data structures
Caveats and Gotchas
rpy2 / R interface
pandas Ecosystem
Comparison with R / R libraries
Comparison with SQL
Comparison with SAS
API Reference
Internals
Release Notes
- pandas 0.19.1
  - Thanks
- pandas 0.19.0
  - Thanks
- pandas 0.18.1
  - Thanks
- pandas 0.18.0
  - Thanks
- pandas 0.17.1
  - Thanks
- pandas 0.17.0
  - Thanks
- pandas 0.16.2
  - Thanks
- pandas 0.16.1
  - Thanks
- pandas 0.16.0
  - Thanks
- pandas 0.15.2
  - Thanks
- pandas 0.15.1
  - Thanks
- pandas 0.15.0
  - Thanks
- pandas 0.14.1
  - Thanks
- pandas 0.14.0
  - Thanks
- pandas 0.13.1
  - New Features
  - API Changes
  - Experimental Features
  - Improvements to existing features
  - Bug Fixes
- pandas 0.13.0
  - New Features
  - Experimental Features
  - Improvements to existing features
  - API Changes
  - Internal Refactoring
  - Bug Fixes
- pandas 0.12.0
  - New Features
  - Improvements to existing features
  - API Changes
  - Experimental Features
  - Bug Fixes
- pandas 0.11.0
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
- pandas 0.10.1
  - New Features
  - API Changes
  - Improvements to existing features
  - Bug Fixes
- pandas 0.10.0
  - New Features
  - Experimental Features
  - API Changes
  - Improvements to existing features
  - Bug Fixes
- pandas 0.9.1
  - New Features
  - API Changes
  - Improvements to existing features
  - Bug Fixes
- pandas 0.9.0
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
- pandas 0.8.1
  - New Features
  - Improvements to existing features
  - Bug Fixes
- pandas 0.8.0
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
- pandas 0.7.3
  - New Features
  - API Changes
  - Bug Fixes
- pandas 0.7.2
  - New Features
  - API Changes
  - Improvements to existing features
  - Bug Fixes
- pandas 0.7.1
  - New Features
  - Improvements to existing features
  - Bug Fixes
- pandas 0.7.0
  - New Features
  - API Changes
  - Improvements to existing features
  - Bug Fixes
  - Thanks
- pandas 0.6.1
  - API Changes
  - New Features
  - Improvements to existing features
  - Bug Fixes
  - Thanks
- pandas 0.6.0
  - API Changes
  - New Features
  - Improvements to existing features
  - Bug Fixes
  - Thanks
- pandas 0.5.0
  - API Changes
  - Deprecations Removed
  - New Features
  - Improvements to existing features
  - Bug Fixes
  - Thanks
- pandas 0.4.3
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
  - Thanks
- pandas 0.4.2
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
  - Thanks
- pandas 0.4.1
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
  - Thanks
- pandas 0.4.0
  - New Features
  - Improvements to existing features
  - API Changes
  - Bug Fixes
  - Thanks
- pandas 0.3.0
  - New features
  - Improvements to existing features
  - API Changes
  - Bug Fixes

Search

Enter search terms or a module, class or function name.

Release Notes¶

This is the list of changes to pandas between each release. For full details,see the commit logs athttp://github.com/pandas-dev/pandas

What is it

pandas is a Python package providing fast, flexible, and expressive datastructures designed to make working with “relational” or “labeled” data botheasy and intuitive. It aims to be the fundamental high-level building block fordoing practical, real world data analysis in Python. Additionally, it has thebroader goal of becoming the most powerful and flexible open source dataanalysis / manipulation tool available in any language.

Where to get it

Source code:http://github.com/pandas-dev/pandas
Binary installers on PyPI:http://pypi.python.org/pypi/pandas
Documentation:http://pandas.pydata.org

pandas 0.19.1¶

Release date: November 3, 2016

This is a minor bug-fix release from 0.19.0 and includes some small regression fixes,bug fixes and performance improvements.

See thev0.19.1 Whatsnew page for an overview of allbugs that have been fixed in 0.19.1.

Thanks¶

Adam Chainz
Anthonios Partheniou
Arash Rouhani
Ben Kandel
Brandon M. Burroughs
Chris
chris-b1
Chris Warth
David Krych
dubourg
gfyoung
Iván Vallés Pérez
Jeff Reback
Joe Jevnik
Jon M. Mease
Joris Van den Bossche
Josh Owen
Keshav Ramaswamy
Larry Ren
mattrijk
Michael Felt
paul-mannino
Piotr Chromiec
Robert Bradshaw
Sinhrks
Thiago Serafim
Tom Bird

pandas 0.19.0¶

Release date: October 2, 2016

This is a major release from 0.18.1 and includes number of API changes, several new features,enhancements, and performance improvements along with a large number of bug fixes. We recommend that allusers upgrade to this version.

Highlights include:

merge_asof() for asof-style time-series joining, seehere
.rolling() is now time-series aware, seehere
read_csv() now supports parsingCategorical data, seehere
A functionunion_categorical() has been added for combining categoricals, seehere
PeriodIndex now has its ownperiod dtype, and changed to be more consistent with otherIndex classes. Seehere
Sparse data structures gained enhanced support ofint andbool dtypes, seehere
Comparison operations withSeries no longer ignores the index, seehere for an overview of the API changes.
Introduction of a pandas development API for utility functions, seehere.
Deprecation ofPanel4D andPanelND. We recommend to represent these types of n-dimensional data with thexarray package.
Removal of the previously deprecated modulespandas.io.data,pandas.io.wb,pandas.tools.rplot.

See thev0.19.0 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.19.0.

Thanks¶

adneu
Adrien Emery
agraboso
Alex Alekseyev
Alex Vig
Allen Riddell
Amol
Amol Agrawal
Andy R. Terrel
Anthonios Partheniou
babakkeyvani
Ben Kandel
Bob Baxley
Brett Rosen
c123w
Camilo Cota
Chris
chris-b1
Chris Grinolds
Christian Hudon
Christopher C. Aycock
Chris Warth
cmazzullo
conquistador1492
cr3
Daniel Siladji
Douglas McNeil
Drewrey Lupton
dsm054
Eduardo Blancas Reyes
Elliot Marsden
Evan Wright
Felix Marczinowski
Francis T. O’Donovan
Gábor Lipták
Geraint Duck
gfyoung
Giacomo Ferroni
Grant Roch
Haleemur Ali
harshul1610
Hassan Shamim
iamsimha
Iulius Curt
Ivan Nazarov
jackieleng
Jeff Reback
Jeffrey Gerard
Jenn Olsen
Jim Crist
Joe Jevnik
John Evans
John Freeman
John Liekezer
Johnny Gill
John W. O’Brien
John Zwinck
Jordan Erenrich
Joris Van den Bossche
Josh Howes
Jozef Brandys
Kamil Sindi
Ka Wo Chen
Kerby Shedden
Kernc
Kevin Sheppard
Matthieu Brucher
Maximilian Roos
Michael Scherer
Mike Graham
Mortada Mehyar
mpuels
Muhammad Haseeb Tariq
Nate George
Neil Parley
Nicolas Bonnotte
OXPHOS
Pan Deng / Zora
Paul
Pauli Virtanen
Paul Mestemaker
Pawel Kordek
Pietro Battiston
pijucha
Piotr Jucha
priyankjain
Ravi Kumar Nimmi
Robert Gieseke
Robert Kern
Roger Thomas
Roy Keyes
Russell Smith
Sahil Dua
Sanjiv Lobo
Sašo Stanovnik
Shawn Heide
sinhrks
Sinhrks
Stephen Kappel
Steve Choi
Stewart Henderson
Sudarshan Konge
Thomas A Caswell
Tom Augspurger
Tom Bird
Uwe Hoffmann
wcwagner
WillAyd
Xiang Zhang
Yadunandan
Yaroslav Halchenko
YG-Riku
Yuichiro Kaneko
yui-knk
zhangjinjie
znmean
颜发才（Yan Facai）

pandas 0.18.1¶

Release date: (May 3, 2016)

This is a minor release from 0.18.0 and includes a large number of bug fixesalong with several new features, enhancements, and performance improvements.

Highlights include:

.groupby(...) has been enhanced to provide convenient syntax when working with.rolling(..),.expanding(..) and.resample(..) per group, seehere
pd.to_datetime() has gained the ability to assemble dates from aDataFrame, seehere
Method chaining improvements, seehere.
Custom business hour offset, seehere.
Many bug fixes in the handling ofsparse, seehere
Expanded theTutorials section with a feature on modern pandas, courtesy of@TomAugsburger. (GH13045).

See thev0.18.1 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.18.1.

Thanks¶

Andrew Fiore-Gartland
Bastiaan
Benoît Vinot
Brandon Rhodes
DaCoEx
Drew Fustin
Ernesto Freitas
Filip Ter
Gregory Livschitz
Gábor Lipták
Hassan Kibirige
Iblis Lin
Israel Saeta Pérez
Jason Wolosonovich
Jeff Reback
Joe Jevnik
Joris Van den Bossche
Joshua Storck
Ka Wo Chen
Kerby Shedden
Kieran O’Mahony
Leif Walsh
Mahmoud Lababidi
Maoyuan Liu
Mark Roth
Matt Wittmann
MaxU
Maximilian Roos
Michael Droettboom
Nick Eubank
Nicolas Bonnotte
OXPHOS
Pauli Virtanen
Peter Waller
Pietro Battiston
Prabhjot Singh
Robin Wilson
Roger Thomas
Sebastian Bank
Stephen Hoover
Tim Hopper
Tom Augspurger
WANG Aiyong
Wes Turner
Winand
Xbar
Yan Facai
adneu
ajenkins-cargometrics
behzad nouri
chinskiy
gfyoung
jeps-journal
jonaslb
kotrfa
nileracecrew
onesandzeroes
rs2
sinhrks
tsdlovell

pandas 0.18.0¶

Release date: (March 13, 2016)

This is a major release from 0.17.1 and includes a small number of API changes, several new features,enhancements, and performance improvements along with a large number of bug fixes. We recommend that allusers upgrade to this version.

Highlights include:

Moving and expanding window functions are now methods on Series and DataFrame,similar to.groupby, seehere.
Adding support for aRangeIndex as a specialized form of theInt64Indexfor memory savings, seehere.
API breaking change to the.resample method to make it more.groupbylike, seehere.
Removal of support for positional indexing with floats, which was deprecatedsince 0.14.0. This will now raise aTypeError, seehere.
The.to_xarray() function has been added for compatibility with thexarray package, seehere.
Theread_sas function has been enhanced to readsas7bdat files, seehere.
Addition of the.str.extractall() method,and API changes to the.str.extract() methodand.str.cat() method.
pd.test() top-level nose test runner is available (GH4327).

See thev0.18.0 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.18.0.

Thanks¶

ARF
Alex Alekseyev
Andrew McPherson
Andrew Rosenfeld
Anthonios Partheniou
Anton I. Sipos
Ben
Ben North
Bran Yang
Chris
Chris Carroux
Christopher C. Aycock
Christopher Scanlin
Cody
Da Wang
Daniel Grady
Dorozhko Anton
Dr-Irv
Erik M. Bray
Evan Wright
Francis T. O’Donovan
Frank Cleary
Gianluca Rossi
Graham Jeffries
Guillaume Horel
Henry Hammond
Isaac Schwabacher
Jean-Mathieu Deschenes
Jeff Reback
Joe Jevnik
John Freeman
John Fremlin
Jonas Hoersch
Joris Van den Bossche
Joris Vankerschaver
Justin Lecher
Justin Lin
Ka Wo Chen
Keming Zhang
Kerby Shedden
Kyle
Marco Farrugia
MasonGallo
MattRijk
Matthew Lurie
Maximilian Roos
Mayank Asthana
Mortada Mehyar
Moussa Taifi
Navreet Gill
Nicolas Bonnotte
Paul Reiners
Philip Gura
Pietro Battiston
RahulHP
Randy Carnevale
Rinoc Johnson
Rishipuri
Sangmin Park
Scott E Lasley
Sereger13
Shannon Wang
Skipper Seabold
Thierry Moisan
Thomas A Caswell
Toby Dylan Hocking
Tom Augspurger
Travis
Trent Hauck
Tux1
Varun
Wes McKinney
Will Thompson
Yoav Ram
Yoong Kang Lim
Yoshiki Vázquez Baeza
Young Joong Kim
Younggun Kim
Yuval Langer
alex argunov
behzad nouri
boombard
brian-pantano
chromy
daniel
dgram0
gfyoung
hack-c
hcontrast
jfoo
kaustuv deolal
llllllllll
ranarag
rockg
scls19fr
seales
sinhrks
srib
surveymedia.ca
tworec

pandas 0.17.1¶

Release date: (November 21, 2015)

This is a minor release from 0.17.0 and includes a large number of bug fixesalong with several new features, enhancements, and performance improvements.

Highlights include:

Support for Conditional HTML Formatting, seehere
Releasing the GIL on the csv reader & other ops, seehere
Regression inDataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (GH11376)

See thev0.17.1 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.17.1.

Thanks¶

Aleksandr Drozd
Alex Chase
Anthonios Partheniou
BrenBarn
Brian J. McGuirk
Chris
Christian Berendt
Christian Perez
Cody Piersall
Data & Code Expert Experimenting with Code on Data
DrIrv
Evan Wright
Guillaume Gay
Hamed Saljooghinejad
Iblis Lin
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes
Jeff Reback
Jimmy Callin
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C
Luo Yicheng
Magnus Jöud
Manuel Leonhardt
Matthew Gilbert
Maximilian Roos
Michael
Nicholas Stahl
Nicolas Bonnotte
Pastafarianist
Petra Chong
Phil Schaf
Philipp A
Rob deCarvalho
Roman Khomenko
Rémy Léone
Sebastian Bank
Thierry Moisan
Tom Augspurger
Tux1
Varun
Wieland Hoffmann
Winterflower
Yoav Ram
Younggun Kim
Zeke
ajcr
azuranski
behzad nouri
cel4
emilydolson
hironow
lexual
llllllllll
rockg
silentquasar
sinhrks
taeold

pandas 0.17.0¶

Release date: (October 9, 2015)

This is a major release from 0.16.2 and includes a small number of API changes, several new features,enhancements, and performance improvements along with a large number of bug fixes. We recommend that allusers upgrade to this version.

Highlights include:

Release the Global Interpreter Lock (GIL) on some cython operations, seehere
Plotting methods are now available as attributes of the.plot accessor, seehere
The sorting API has been revamped to remove some long-time inconsistencies, seehere
Support for adatetime64[ns] with timezones as a first-class dtype, seehere
The default forto_datetime will now be toraise when presented with unparseable formats,previously this would return the original input. Also, date parsefunctions now return consistent results. Seehere
The default fordropna inHDFStore has changed toFalse, to store by default all rows evenif they are allNaN, seehere
Datetime accessor (dt) now supportsSeries.dt.strftime to generate formatted strings for datetime-likes, andSeries.dt.total_seconds to generate each duration of the timedelta in seconds. Seehere
Period andPeriodIndex can handle multiplied freq like3D, which corresponding to 3 days span. Seehere
Development installed versions of pandas will now havePEP440 compliant version strings (GH9518)
Development support for benchmarking with theAir Speed Velocity library (GH8316)
Support for reading SAS xport files, seehere
Documentation comparing SAS topandas, seehere
Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, seehere
Display format with plain text can optionally align with Unicode East Asian Width, seehere
Compatibility with Python 3.5 (GH11097)
Compatibility with matplotlib 1.5.0 (GH11111)

See thev0.17.0 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.17.0.

Thanks¶

Alex Rothberg
Andrea Bedini
Andrew Rosenfeld
Andy Li
Anthonios Partheniou
Artemy Kolchinsky
Bernard Willers
Charlie Clark
Chris
Chris Whelan
Christoph Gohlke
Christopher Whelan
Clark Fitzgerald
Clearfield Christopher
Dan Ringwalt
Daniel Ni
Data & Code Expert Experimenting with Code on Data
David Cottrell
David John Gagne
David Kelly
ETF
Eduardo Schettino
Egor
Egor Panfilov
Evan Wright
Frank Pinter
Gabriel Araujo
Garrett-R
Gianluca Rossi
Guillaume Gay
Guillaume Poulin
Harsh Nisar
Ian Henriksen
Ian Hoegen
Jaidev Deshpande
Jan Rudolph
Jan Schulz
Jason Swails
Jeff Reback
Jonas Buyl
Joris Van den Bossche
Joris Vankerschaver
Josh Levy-Kramer
Julien Danjou
Ka Wo Chen
Karrie Kehoe
Kelsey Jordahl
Kerby Shedden
Kevin Sheppard
Lars Buitinck
Leif Johnson
Luis Ortiz
Mac
Matt Gambogi
Matt Savoie
Matthew Gilbert
Maximilian Roos
Michelangelo D’Agostino
Mortada Mehyar
Nick Eubank
Nipun Batra
Ondřej Čertík
Phillip Cloud
Pratap Vardhan
Rafal Skolasinski
Richard Lewis
Rinoc Johnson
Rob Levy
Robert Gieseke
Safia Abdalla
Samuel Denny
Saumitra Shahapure
Sebastian Pölsterl
Sebastian Rubbert
Sheppard, Kevin
Sinhrks
Siu Kwan Lam
Skipper Seabold
Spencer Carrucciu
Stephan Hoyer
Stephen Hoover
Stephen Pascoe
Terry Santegoeds
Thomas Grainger
Tjerk Santegoeds
Tom Augspurger
Vincent Davis
Winterflower
Yaroslav Halchenko
Yuan Tang (Terry)
agijsberts
ajcr
behzad nouri
cel4
cyrusmaher
davidovitch
ganego
jreback
juricast
larvian
maximilianr
msund
rekcahpassyla
robertzk
scls19fr
seth-p
sinhrks
springcoil
terrytangyuan
tzinckgraf

pandas 0.16.2¶

Release date: (June 12, 2015)

This is a minor release from 0.16.1 and includes a large number of bug fixesalong with several new features, enhancements, and performance improvements.

Highlights include:

A newpipe method, seehere
Documentation on how to usenumba withpandas, seehere

See thev0.16.2 Whatsnew overview for an extensive listof all enhancements and bugs that have been fixed in 0.16.2.

Thanks¶

Andrew Rosenfeld
Artemy Kolchinsky
Bernard Willers
Christer van der Meeren
Christian Hudon
Constantine Glen Evans
Daniel Julius Lasiman
Evan Wright
Francesco Brundu
Gaëtan de Menten
Jake VanderPlas
James Hiebert
Jeff Reback
Joris Van den Bossche
Justin Lecher
Ka Wo Chen
Kevin Sheppard
Mortada Mehyar
Morton Fox
Robin Wilson
Thomas Grainger
Tom Ajamian
Tom Augspurger
Yoshiki Vázquez Baeza
Younggun Kim
austinc
behzad nouri
jreback
lexual
rekcahpassyla
scls19fr
sinhrks

pandas 0.16.1¶

Release date: (May 11, 2015)

This is a minor release from 0.16.0 and includes a large number of bug fixesalong with several new features, enhancements, and performance improvements.A small number of API changes were necessary to fix existing bugs.

See thev0.16.1 Whatsnew overview for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.16.1.

Thanks¶

Alfonso MHC
Andy Hayden
Artemy Kolchinsky
Chris Gilmer
Chris Grinolds
Dan Birken
David BROCHART
David Hirschfeld
David Stephens
Dr. Leo
Evan Wright
Frans van Dunné
Hatem Nassrat
Henning Sperr
Hugo Herter
Jan Schulz
Jeff Blackburne
Jeff Reback
Jim Crist
Jonas Abernot
Joris Van den Bossche
Kerby Shedden
Leo Razoumov
Manuel Riel
Mortada Mehyar
Nick Burns
Nick Eubank
Olivier Grisel
Phillip Cloud
Pietro Battiston
Roy Hyunjin Han
Sam Zhang
Scott Sanderson
Stephan Hoyer
Tiago Antao
Tom Ajamian
Tom Augspurger
Tomaz Berisa
Vikram Shirgur
Vladimir Filimonov
William Hogman
Yasin A
Younggun Kim
behzad nouri
dsm054
floydsoft
flying-sheep
gfr
jnmclarty
jreback
ksanghai
lucas
mschmohl
ptype
rockg
scls19fr
sinhrks

pandas 0.16.0¶

Release date: (March 22, 2015)

This is a major release from 0.15.2 and includes a number of API changes, several new features, enhancements, andperformance improvements along with a large number of bug fixes.

Highlights include:

DataFrame.assign method, seehere
Series.to_coo/from_coo methods to interact withscipy.sparse, seehere
Backwards incompatible change toTimedelta to conform the.seconds attribute withdatetime.timedelta, seehere
Changes to the.loc slicing API to conform with the behavior of.ix seehere
Changes to the default for ordering in theCategorical constructor, seehere
Thepandas.tools.rplot,pandas.sandbox.qtpandas andpandas.rpymodules are deprecated. We refer users to external packages likeseaborn,pandas-qt andrpy2 for similar or equivalentfunctionality, seehere

See thev0.16.0 Whatsnew overview or the issue tracker on GitHub for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.16.0.

Thanks¶

Aaron Toth
Alan Du
Alessandro Amici
Artemy Kolchinsky
Ashwini Chaudhary
Ben Schiller
Bill Letson
Brandon Bradley
Chau Hoang
Chris Reynolds
Chris Whelan
Christer van der Meeren
David Cottrell
David Stephens
Ehsan Azarnasab
Garrett-R
Guillaume Gay
Jake Torcasso
Jason Sexauer
Jeff Reback
John McNamara
Joris Van den Bossche
Joschka zur Jacobsmühlen
Juarez Bochi
Junya Hayashi
K.-Michael Aye
Kerby Shedden
Kevin Sheppard
Kieran O’Mahony
Kodi Arfer
Matti Airas
Min RK
Mortada Mehyar
Robert
Scott E Lasley
Scott Lasley
Sergio Pascual
Skipper Seabold
Stephan Hoyer
Thomas Grainger
Tom Augspurger
TomAugspurger
Vladimir Filimonov
Vyomkesh Tripathi
Will Holmgren
Yulong Yang
behzad nouri
bertrandhaut
bjonen
cel4
clham
hsperr
ischwabacher
jnmclarty
josham
jreback
omtinez
roch
sinhrks
unutbu

pandas 0.15.2¶

Release date: (December 12, 2014)

This is a minor release from 0.15.1 and includes a large number of bug fixesalong with several new features, enhancements, and performance improvements.A small number of API changes were necessary to fix existing bugs.

See thev0.15.2 Whatsnew overview for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.15.2.

Thanks¶

Aaron Staple
Angelos Evripiotis
Artemy Kolchinsky
Benoit Pointet
Brian Jacobowski
Charalampos Papaloizou
Chris Warth
David Stephens
Fabio Zanini
Francesc Via
Henry Kleynhans
Jake VanderPlas
Jan Schulz
Jeff Reback
Jeff Tratner
Joris Van den Bossche
Kevin Sheppard
Matt Suggit
Matthew Brett
Phillip Cloud
Rupert Thompson
Scott E Lasley
Stephan Hoyer
Stephen Simmons
Sylvain Corlay
Thomas Grainger
Tiago Antao
Trent Hauck
Victor Chaves
Victor Salgado
Vikram Bhandoh
WANG Aiyong
Will Holmgren
behzad nouri
broessli
charalampos papaloizou
immerrr
jnmclarty
jreback
mgilbert
onesandzeroes
peadarcoyle
rockg
seth-p
sinhrks
unutbu
wavedatalab
Åsmund Hjulstad

pandas 0.15.1¶

Release date: (November 9, 2014)

This is a minor release from 0.15.0 and includes a small number of API changes, several new features, enhancements, andperformance improvements along with a large number of bug fixes.

See thev0.15.1 Whatsnew overview for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.15.1.

Thanks¶

Aaron Staple
Andrew Rosenfeld
Anton I. Sipos
Artemy Kolchinsky
Bill Letson
Dave Hughes
David Stephens
Guillaume Horel
Jeff Reback
Joris Van den Bossche
Kevin Sheppard
Nick Stahl
Sanghee Kim
Stephan Hoyer
TomAugspurger
WANG Aiyong
behzad nouri
immerrr
jnmclarty
jreback
pallav-fdsi
unutbu

pandas 0.15.0¶

Release date: (October 18, 2014)

This is a major release from 0.14.1 and includes a number of API changes, several new features, enhancements, andperformance improvements along with a large number of bug fixes.

Highlights include:

Drop support for numpy < 1.7.0 (GH7711)
TheCategorical type was integrated as a first-class pandas type, seehere
New scalar typeTimedelta, and a new index typeTimedeltaIndex, seehere
New DataFrame default display fordf.info() to include memory usage, seeMemory Usage
New datetimelike properties accessor.dt for Series, seeDatetimelike Properties
Split indexing documentation intoIndexing and Selecting Data andMultiIndex / Advanced Indexing
Split out string methods documentation intoWorking with Text Data
read_csv will now by default ignore blank lines when parsing, seehere
API change in using Indexes in set operations, seehere
Internal refactoring of theIndex class to no longer sub-classndarray, seeInternal Refactoring
dropping support forPyTables less than version 3.0.0, andnumexpr less than version 2.1 (GH7990)

See thev0.15.0 Whatsnew overview or the issue tracker on GitHub for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.15.0.

Thanks¶

Aaron Schumacher
Adam Greenhall
Andy Hayden
Anthony O’Brien
Artemy Kolchinsky
behzad nouri
Benedikt Sauer
benjamin
Benjamin Thyreau
Ben Schiller
bjonen
BorisVerk
Chris Reynolds
Chris Stoafer
Dav Clark
dlovell
DSM
dsm054
FragLegs
German Gomez-Herrero
Hsiaoming Yang
Huan Li
hunterowens
Hyungtae Kim
immerrr
Isaac Slavitt
ischwabacher
Jacob Schaer
Jacob Wasserman
Jan Schulz
Jeff Tratner
Jesse Farnham
jmorris0x0
jnmclarty
Joe Bradish
Joerg Rittinger
John W. O’Brien
Joris Van den Bossche
jreback
Kevin Sheppard
klonuo
Kyle Meyer
lexual
Max Chang
mcjcode
Michael Mueller
Michael W Schatzow
Mike Kelly
Mortada Mehyar
mtrbean
Nathan Sanders
Nathan Typanski
onesandzeroes
Paul Masurel
Phillip Cloud
Pietro Battiston
RenzoBertocchi
rockg
Ross Petchler
seth-p
Shahul Hameed
Shashank Agarwal
sinhrks
someben
stahlous
stas-sl
Stephan Hoyer
thatneat
tom-alcorn
TomAugspurger
Tom Augspurger
Tony Lorenzo
unknown
unutbu
Wes Turner
Wilfred Hughes
Yevgeniy Grechka
Yoshiki VÃ¡zquez Baeza
zachcp

pandas 0.14.1¶

Release date: (July 11, 2014)

This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, andperformance improvements along with a large number of bug fixes.

Highlights include:

New methodsselect_dtypes() to select columnsbased on the dtype andsem() to calculate thestandard error of the mean.
Support for dateutil timezones (seedocs).
Support for ignoring full line comments in theread_csv()text parser.
New documentation section onOptions and Settings.
Lots of bug fixes.

See thev0.14.1 Whatsnew overview or the issue tracker on GitHub for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.14.1.

Thanks¶

Andrew Rosenfeld
Andy Hayden
Benjamin Adams
Benjamin M. Gross
Brian Quistorff
Brian Wignall
bwignall
clham
Daniel Waeber
David Bew
David Stephens
DSM
dsm054
helger
immerrr
Jacob Schaer
jaimefrio
Jan Schulz
John David Reaver
John W. O’Brien
Joris Van den Bossche
jreback
Julien Danjou
Kevin Sheppard
K.-Michael Aye
Kyle Meyer
lexual
Matthew Brett
Matt Wittmann
Michael Mueller
Mortada Mehyar
onesandzeroes
Phillip Cloud
Rob Levy
rockg
sanguineturtle
Schaer, Jacob C
seth-p
sinhrks
Stephan Hoyer
Thomas Kluyver
Todd Jennings
TomAugspurger
unknown
yelite

pandas 0.14.0¶

Release date: (May 31, 2014)

This is a major release from 0.13.1 and includes a number of API changes, several new features, enhancements, andperformance improvements along with a large number of bug fixes.

Highlights include:

Officially support Python 3.4
SQL interfaces updated to usesqlalchemy, seehere.
Display interface changes, seehere
MultiIndexing using Slicers, seehere.
Ability to join a singly-indexed DataFrame with a multi-indexed DataFrame, seehere
More consistency in groupby results and more flexible groupby specifications, seehere
Holiday calendars are now supported inCustomBusinessDay, seehere
Several improvements in plotting functions, including: hexbin, area and pie plots, seehere.
Performance doc section on I/O operations, seehere

See thev0.14.0 Whatsnew overview or the issue tracker on GitHub for an extensive listof all API changes, enhancements and bugs that have been fixed in 0.14.0.

Thanks¶

Acanthostega
Adam Marcus
agijsberts
akittredge
Alex Gaudio
Alex Rothberg
AllenDowney
Andrew Rosenfeld
Andy Hayden
ankostis
anomrake
Antoine Mazières
anton-d
bashtage
Benedikt Sauer
benjamin
Brad Buran
bwignall
cgohlke
chebee7i
Christopher Whelan
Clark Fitzgerald
clham
Dale Jung
Dan Allan
Dan Birken
danielballan
Daniel Waeber
David Jung
David Stephens
Douglas McNeil
DSM
Garrett Drapala
Gouthaman Balaraman
Guillaume Poulin
hshimizu77
hugo
immerrr
ischwabacher
Jacob Howard
Jacob Schaer
jaimefrio
Jason Sexauer
Jeff Reback
Jeffrey Starr
Jeff Tratner
John David Reaver
John McNamara
John W. O’Brien
Jonathan Chambers
Joris Van den Bossche
jreback
jsexauer
Julia Evans
Júlio
Katie Atkinson
kdiether
Kelsey Jordahl
Kevin Sheppard
K.-Michael Aye
Matthias Kuhn
Matt Wittmann
Max Grender-Jones
Michael E. Gruen
michaelws
mikebailey
Mike Kelly
Nipun Batra
Noah Spies
ojdo
onesandzeroes
Patrick O’Keeffe
phaebz
Phillip Cloud
Pietro Battiston
PKEuS
Randy Carnevale
ribonoous
Robert Gibboni
rockg
sinhrks
Skipper Seabold
SplashDance
Stephan Hoyer
Tim Cera
Tobias Brandt
Todd Jennings
TomAugspurger
Tom Augspurger
unutbu
westurner
Yaroslav Halchenko
y-p
zach powers

pandas 0.13.1¶

Release date: (February 3, 2014)

New Features¶

Addeddate_format anddatetime_format attribute toExcelWriter.(GH4133)

API Changes¶

Series.sort will raise aValueError (rather than aTypeError) on sorting anobject that is a view of another (GH5856,GH5853)
Raise/WarnSettingWithCopyError (according to the optionchained_assignment in more cases,when detecting chained assignment, related (GH5938,GH6025)
DataFrame.head(0) returns self instead of empty frame (GH5846)
autocorrelation_plot now accepts**kwargs. (GH5623)
convert_objects now accepts aconvert_timedeltas='coerce' argument to allow forced dtype conversion oftimedeltas (GH5458,:issue:5689)
Add-NaN and-nan to the default set of NA values(GH5952). SeeNA Values.
NDFrame now has anequals method. (GH5283)
DataFrame.apply will use thereduce argument to determine whether aSeries or aDataFrame should be returned when theDataFrame isempty (GH6007).

Experimental Features¶

Improvements to existing features¶

perf improvements in Series datetime/timedelta binary operations (GH5801)
option_context context manager now available as top-level API (GH5752)
df.info() view now display dtype info per column (GH5682)
df.info() now honors option max_info_rows, disable null counts for large frames (GH5974)
perf improvements in DataFramecount/dropna foraxis=1
Series.str.contains now has aregex=False keyword which can be faster for plain (non-regex) string patterns. (GH5879)
supportdtypes property onSeries/Panel/Panel4D
extendPanel.apply to allow arbitrary functions (rather than only ufuncs) (GH1148)allow multiple axes to be used to operate on slabs of aPanel
TheArrayFormatter fordatetime andtimedelta64 now intelligentlylimit precision based on the values in the array (GH3401)
pd.show_versions() is now available for convenience when reporting issues.
perf improvements to Series.str.extract (GH5944)
perf improvements indtypes/ftypes methods (GH5968)
perf improvements in indexing with object dtypes (GH5968)
improved dtype inference fortimedelta like passed to constructors (GH5458,GH5689)
escape special characters when writing to latex (:issue:5374)
perf improvements inDataFrame.apply (GH6013)
pd.read_csv andpd.to_datetime learned a newinfer_datetime_format keyword which greatlyimproves parsing perf in many cases. Thanks to @lexual for suggesting and @danbirkenfor rapidly implementing. (GH5490,:issue:6021)
add ability to recognize ‘%p’ format code (am/pm) to date parsers when the specific formatis supplied (GH5361)
Fix performance regression in JSON IO (GH5765)
performance regression in Index construction from Series (GH6150)

Bug Fixes¶

Bug inio.wb.get_countries not including all countries (GH6008)
Bug in Series replace with timestamp dict (GH5797)
read_csv/read_table now respects theprefix kwarg (GH5732).
Bug in selection with missing values via.ix from a duplicate indexed DataFrame failing (GH5835)
Fix issue of boolean comparison on empty DataFrames (GH5808)
Bug in isnull handlingNaT in an object array (GH5443)
Bug into_datetime when passed anp.nan or integer datelike and a format string (GH5863)
Bug in groupby dtype conversion with datetimelike (GH5869)
Regression in handling of empty Series as indexers to Series (GH5877)
Bug in internal caching, related to (GH5727)
Testing bug in reading JSON/msgpack from a non-filepath on windows under py3 (GH5874)
Bug when assigning to .ix[tuple(...)] (GH5896)
Bug in fully reindexing a Panel (GH5905)
Bug in idxmin/max with object dtypes (GH5914)
Bug inBusinessDay when adding n days to a date not on offset when n>5 and n%5==0 (GH5890)
Bug in assigning to chained series with a series via ix (GH5928)
Bug in creating an empty DataFrame, copying, then assigning (GH5932)
Bug in DataFrame.tail with empty frame (GH5846)
Bug in propagating metadata onresample (GH5862)
Fixed string-representation ofNaT to be “NaT” (GH5708)
Fixed string-representation for Timestamp to show nanoseconds if present (GH5912)
pd.match not returning passed sentinel
Panel.to_frame() no longer fails whenmajor_axis is aMultiIndex (GH5402).
Bug inpd.read_msgpack with inferring aDateTimeIndex frequencyincorrectly (GH5947)
Fixedto_datetime for array with both Tz-aware datetimes andNaT‘s (GH5961)
Bug in rolling skew/kurtosis when passed a Series with bad data (GH5749)
Bug in scipyinterpolate methods with a datetime index (GH5975)
Bug in NaT comparison if a mixed datetime/np.datetime64 with NaT were passed (GH5968)
Fixed bug withpd.concat losing dtype information if all inputs are empty (GH5742)
Recent changes in IPython cause warnings to be emitted when using previous versionsof pandas in QTConsole, now fixed. If you’re using an older version andneed to suppress the warnings, see (GH5922).
Bug in mergingtimedelta dtypes (GH5695)
Bug in plotting.scatter_matrix function. Wrong alignment among diagonaland off-diagonal plots, see (GH5497).
Regression in Series with a multi-index via ix (GH6018)
Bug in Series.xs with a multi-index (GH6018)
Bug in Series construction of mixed type with datelike and an integer (which should result inobject type and not automatic conversion) (GH6028)
Possible segfault when chained indexing with an object array under numpy 1.7.1 (GH6026,GH6056)
Bug in setting using fancy indexing a single element with a non-scalar (e.g. a list),(GH6043)
to_sql did not respectif_exists (GH4110 GH4304)
Regression in.get(None) indexing from 0.12 (GH5652)
Subtleiloc indexing bug, surfaced in (GH6059)
Bug with insert of strings into DatetimeIndex (GH5818)
Fixed unicode bug in to_html/HTML repr (GH6098)
Fixed missing arg validation in get_options_data (GH6105)
Bug in assignment with duplicate columns in a frame where the locationsare a slice (e.g. next to each other) (GH6120)
Bug in propogating _ref_locs during construction of a DataFrame with dupsindex/columns (GH6121)
Bug inDataFrame.apply when using mixed datelike reductions (GH6125)
Bug inDataFrame.append when appending a row with different columns (GH6129)
Bug in DataFrame construction with recarray and non-ns datetime dtype (GH6140)
Bug in.loc setitem indexing with a dataframe on rhs, multiple item setting, anda datetimelike (GH6152)
Fixed a bug inquery/eval during lexicographic string comparisons (GH6155).
Fixed a bug inquery where the index of a single-elementSeries wasbeing thrown away (GH6148).
Bug inHDFStore on appending a dataframe with multi-indexed columns toan existing table (GH6167)
Consistency with dtypes in setting an empty DataFrame (GH6171)
Bug in selecting on a multi-indexHDFStore even in the presence of underspecified column spec (GH6169)
Bug innanops.var withddof=1 and 1 elements would sometimes returninfrather thannan on some platforms (GH6136)
Bug in Series and DataFrame bar plots ignoring theuse_index keyword (GH6209)
Bug in groupby with mixed str/int under python3 fixed;argsort was failing (GH6212)

pandas 0.13.0¶

Release date: January 3, 2014

New Features¶

plot(kind='kde') now accepts the optional parametersbw_method andind, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to setthe bandwidth, and to gkde.evaluate() to specify the indicies at which itis evaluated, respectively. See scipy docs. (GH4298)
Addedisin method to DataFrame (GH4211)
df.to_clipboard() learned a newexcel keyword that let’s youpaste df data directly into excel (enabled by default). (GH5070).
Clipboard functionality now works with PySide (GH4282)
Newextract string method returns regex matches more conveniently(GH4685)
Auto-detect field widths in read_fwf when unspecified (GH4488)
to_csv() now outputs datetime objects according to a specified formatstring via thedate_format keyword (GH4313)
AddedLastWeekOfMonth DateOffset (GH4637)
Addedcumcount groupby method (GH4646)
AddedFY5253, andFY5253Quarter DateOffsets (GH4511)
Addedmode() method toSeries andDataFrame to get thestatistical mode(s) of a column/series. (GH5367)

Experimental Features¶

The neweval() function implements expression evaluationusingnumexpr behind the scenes. This results in large speedups forcomplicated expressions involving large DataFrames/Series.
DataFrame has a neweval() thatevaluates an expression in the context of theDataFrame; allowsinline expression assignment
Aquery() method has been added that allowsyou to select elements of aDataFrame using a natural query syntaxnearly identical to Python syntax.
pd.eval and friends now evaluate operations involvingdatetime64objects in Python space becausenumexpr cannot handleNaT values(GH4897).
Add msgpack support viapd.read_msgpack() andpd.to_msgpack() /df.to_msgpack() for serialization of arbitrary pandas (and pythonobjects) in a lightweight portable binary format (GH686,GH5506)
Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
Addedpandas.io.gbq for reading from (and writing to) GoogleBigQuery into a DataFrame. (GH4140)

Improvements to existing features¶

read_html now raises aURLError instead of catching and raising aValueError (GH4303,GH4305)
read_excel now supports an integer in itssheetname argument givingthe index of the sheet to read in (GH4301).
get_dummies works with NaN (GH4446)
Added a test forread_clipboard() andto_clipboard()(GH4282)
Added bins argument tovalue_counts (GH3945), also sort andascending, now available in Series method as well as top-level function.
Text parser now treats anything that reads like inf (“inf”, “Inf”, “-Inf”,“iNf”, etc.) to infinity. (GH4220,GH4219), affectingread_table,read_csv, etc.
Added a more informative error message when plot arguments containoverlapping color and style arguments (GH4402)
Significant table writing performance improvements inHDFStore
JSON date serialization now performed in low-level C code.
JSON support for encoding datetime.time
Expanded JSON docs, more info about orient options and the use of the numpyparam when decoding.
Adddrop_level argument to xs (GH4180)
Can now resample a DataFrame with ohlc (GH2320)
Index.copy() andMultiIndex.copy() now accept keyword arguments tochange attributes (i.e.,names,levels,labels)(GH4039)
Addrename andset_names methods toIndex as well asset_names,set_levels,set_labels toMultiIndex.(GH4039) with improved validation for all (GH4039,GH4794)
A Series of dtypetimedelta64[ns] can now be divided/multipliedby an integer series (GH4521)
A Series of dtypetimedelta64[ns] can now be divided by anothertimedelta64[ns] object to yield afloat64 dtyped Series. Thisis frequency conversion; astyping is also supported.
Timedelta64 supportfillna/ffill/bfill with an integer interpreted asseconds, or atimedelta (GH3371)
Box numeric ops ontimedelta Series (GH4984)
Datetime64 supportffill/bfill
Performance improvements with__getitem__ onDataFrames withwhen the key is a column
Support for using aDatetimeIndex/PeriodsIndex directly in a datelikecalculation e.g. s-s.index (GH4629)
Better/cleaned up exceptions in core/common, io/excel and core/format(GH4721,GH3954), as well as cleaned up test cases intests/test_frame, tests/test_multilevel (GH4732).
Performance improvement of timeseries plotting with PeriodIndex and addedtest to vbench (GH4705 andGH4722)
Addaxis andlevel keywords towhere, so that theotherargument can now be an alignable pandas object.
to_datetime with a format of ‘%Y%m%d’ now parses much faster
It’s now easier to hook new Excel writers into pandas (just subclassExcelWriter and register your engine). You can specify anengine into_excel or inExcelWriter. You can also specify which writers youwant to use by default with config optionsio.excel.xlsx.writer andio.excel.xls.writer. (GH4745,GH4750)
Panel.to_excel() now accepts keyword arguments that will be passed toitsDataFrame‘sto_excel() methods. (GH4750)
Added XlsxWriter as an optionalExcelWriter engine. This is about 5xfaster than the default openpyxl xlsx writer and is equivalent in speedto the xlwt xls writer module. (GH4542)
allow DataFrame constructor to accept more list-like objects, e.g. list ofcollections.Sequence andarray.Array objects (GH3783,GH4297,GH4851), thanks @lgautier
DataFrame constructor now accepts a numpy masked record array(GH3478), thanks @jnothman
__getitem__ withtuple key (e.g.,[:,2]) onSerieswithoutMultiIndex raisesValueError (GH4759,GH4837)
read_json now raises a (more informative)ValueError when the dictcontains a bad key andorient='split' (GH4730,GH4838)
read_stata now accepts Stata 13 format (GH4291)
ExcelWriter andExcelFile can be used as contextmanagers.(GH3441,GH4933)
pandas is now tested with two different versions ofstatsmodels(0.4.3 and 0.5.0) (GH4981).
Better string representations ofMultiIndex (including ability toroundtrip viarepr). (GH3347,GH4935)
Both ExcelFile and read_excel to accept an xlrd.Book for the io(formerly path_or_buf) argument; this requires engine to be set.(GH4961).
concat now gives a more informative error message when passed objectsthat cannot be concatenated (GH4608).
Addhalflife option to exponentially weighted moving functions (PRGH4998)
to_dict now takesrecords as a possible outtype. Returns an arrayof column-keyed dictionaries. (GH4936)
tz_localize can infer a fall daylight savings transition based on thestructure of unlocalized data (GH4230)
DatetimeIndex is now in the API documentation
Improve support for converting R datasets to pandas objects (moreinformative index for timeseries and numeric, support for factors, dist,and high-dimensional arrays).
read_html() now supports theparse_dates,tupleize_cols andthousands parameters (GH4770).
json_normalize() is a new method to allow you tocreate a flat table from semi-structured JSON data.See thedocs (GH1067)
DataFrame.from_records() will now accept generators (GH4910)
DataFrame.interpolate() andSeries.interpolate() have been expandedto include interpolation methods from scipy. (GH4434,GH1892)
Series now supports ato_frame method to convert it to asingle-column DataFrame (GH5164)
DatetimeIndex (and date_range) can now be constructed in a left- orright-open fashion using theclosed parameter (GH4579)
Python csv parser now supports usecols (GH4335)
Added support for Google Analytics v3 API segment IDs that also supports v2IDs. (GH5271)
NDFrame.drop() now accepts names as well as integers for the axisargument. (GH5354)
Added short docstrings to a few methods that were missing them + fixed thedocstrings for Panel flex methods. (GH5336)
NDFrame.drop(),NDFrame.dropna(), and.drop_duplicates() allacceptinplace as a keyword argument; however, this only means that thewrapper is updated inplace, a copy is still made internally.(GH1960,GH5247,GH5628, and relatedGH2325 [still notclosed])
Fixed bug intools.plotting.andrews_curvres so that lines are drawn groupedby color as expected.
read_excel() now tries to convert integral floats (like1.0) to intby default. (GH5394)
Excel writers now have a default optionmerge_cells into_excel()to merge cells in MultiIndex and Hierarchical Rows. Note: using thisoption it is no longer possible to round trip Excel files with mergedMultiIndex and Hierarchical Rows. Set themerge_cells toFalse torestore the previous behaviour. (GH5254)
The FRED DataReader now accepts multiple series (:issue`3413`)
StataWriter adjusts variable names to Stata’s limitations (GH5709)

API Changes¶

DataFrame.reindex() and forward/backward filling now raises ValueErrorif either index is not monotonic (GH4483,GH4484).
pandas now is Python 2/3 compatible without the need for 2to3 thanks to@jtratner. As a result, pandas now uses iterators more extensively. Thisalso led to the introduction of substantive parts of the BenjaminPeterson’ssix library into compat. (GH4384,GH4375,GH4372)
pandas.util.compat andpandas.util.py3compat have been merged intopandas.compat.pandas.compat now includes many functions allowing2/3 compatibility. It contains both list and iterator versions of range,filter, map and zip, plus other necessary elements for Python 3compatibility.lmap,lzip,lrange andlfilter all producelists instead of iterators, for compatibility withnumpy, subscriptingandpandas constructors.(GH4384,GH4375,GH4372)
deprecatediterkv, which will be removed in a future release (was justan alias of iteritems used to get around2to3‘s changes).(GH4384,GH4375,GH4372)
Series.get with negative indexers now returns the same as[](GH4390)
allowix/loc for Series/DataFrame/Panel to set on any axis even whenthe single-key is not currently contained in the index for that axis(GH2578,GH5226,GH5632,GH5720,GH5744,GH5756)
Default export forto_clipboard is now csv with a sep oft forcompat (GH3368)
at now will enlarge the object inplace (and return the same)(GH2578)
DataFrame.plot will scatter plot x versus y by passingkind='scatter' (GH2215)
HDFStore
- append_to_multiple automatically synchronizes writing rows to multipletables and adds adropna kwarg (GH4698)
- handle a passedSeries in table format (GH4330)
- added anis_open property to indicate if the underlying file handleis_open; a closed store will now report ‘CLOSED’ when viewing the store(rather than raising an error) (GH4409)
- a close of aHDFStore now will close that instance of theHDFStore but will only close the actual file if the ref count (byPyTables) w.r.t. all of the open handles are 0. Essentially you havea local instance ofHDFStore referenced by a variable. Once you closeit, it will report closed. Other references (to the same file) willcontinue to operate until they themselves are closed. Performing anaction on a closed file will raiseClosedFileError
- removed the_quiet attribute, replace by aDuplicateWarning ifretrieving duplicate rows from a table (GH4367)
- removed thewarn argument fromopen. Instead aPossibleDataLossError exception will be raised if you try to usemode='w' with an OPEN file handle (GH4367)
- allow a passed locations array or mask as awhere condition(GH4467)
- add the keyworddropna=True toappend to change whether ALL nanrows are not written to the store (default isTrue, ALL nan rows areNOT written), also settable via the optionio.hdf.dropna_table(GH4625)
- theformat keyword now replaces thetable keyword; allowed valuesarefixed(f)|table(t) theStorer format has been renamed toFixed
- a column multi-index will be recreated properly (GH4710); raise ontrying to use a multi-index with data_columns on the same axis
- select_as_coordinates will now return anInt64Index of theresultant selection set
- supporttimedelta64[ns] as a serialization type (GH3577)
- storedatetime.date objects as ordinals rather then timetuples to avoidtimezone issues (GH2852), thanks @tavistmorph and @numpand
- numexpr 2.2.2 fixes incompatibility in PyTables 2.4 (GH4908)
- flush now accepts anfsync parameter, which defaults toFalse(GH5364)
- unicode indices not supported ontable formats (GH5386)
- pass thru store creation arguments; can be used to support in-memory stores
JSON
- addeddate_unit parameter to specify resolution of timestamps.Options are seconds, milliseconds, microseconds and nanoseconds.(GH4362,GH4498).
- addeddefault_handler parameter to allow a callable to be passedwhich will be responsible for handling otherwise unserialiable objects.(GH5138)
Index andMultiIndex changes (GH4039):
- Settinglevels andlabels directly onMultiIndex is nowdeprecated. Instead, you can use theset_levels() andset_labels() methods.
- levels,labels andnames properties no longer return lists,but instead return containers that do not allow setting of items(‘mostly immutable’)
- levels,labels andnames are validated upon setting and areeither copied or shallow-copied.
- inplace setting oflevels orlabels now correctly invalidates thecached properties. (GH5238).
- __deepcopy__ now returns a shallow copy (currently: a view) of thedata - allowing metadata changes.
- MultiIndex.astype() now only allowsnp.object_-like dtypes andnow returns aMultiIndex rather than anIndex. (GH4039)
- Addedis_ method toIndex that allows fast equality comparison ofviews (similar tonp.may_share_memory but no false positives, andchanges onlevels andlabels setting onMultiIndex).(GH4859 ,GH4909)
- Aliased__iadd__ to__add__. (GH4996)
- Addedis_ method toIndex that allows fast equality comparison ofviews (similar tonp.may_share_memory but no false positives, andchanges onlevels andlabels setting onMultiIndex).(GH4859,GH4909)
Infer and downcast dtype ifdowncast='infer' is passed tofillna/ffill/bfill (GH4604)
__nonzero__ for all NDFrame objects, will now raise aValueError,this reverts back to (GH1073,GH4633) behavior. Add.bool() method toNDFrame objects to facilitate evaluating ofsingle-element boolean Series
DataFrame.update() no longer raises aDataConflictError, it nowwill raise aValueError instead (if necessary) (GH4732)
Series.isin() andDataFrame.isin() now raise aTypeError whenpassed a string (GH4763). Pass alist of one element (containingthe string) instead.
Remove undocumented/unusedkind keyword argument fromread_excel,andExcelFile. (GH4713,GH4712)
Themethod argument ofNDFrame.replace() is valid again, so that aa list can be passed toto_replace (GH4743).
provide automatic dtype conversions on _reduce operations (GH3371)
exclude non-numerics if mixed types with datelike in _reduce operations(GH3371)
default fortupleize_cols is nowFalse for bothto_csv andread_csv. Fair warning in 0.12 (GH3604)
moved timedeltas support to pandas.tseries.timedeltas.py; add timedeltasstring parsing, add top-levelto_timedelta function
NDFrame now is compatible with Python’s toplevelabs() function(GH4821).
raise aTypeError on invalid comparison ops on Series/DataFrame (e.g.integer/datetime) (GH4968)
Added a new index type,Float64Index. This will be automaticallycreated when passing floating values in index creation. This enables apure label-based slicing paradigm that makes[],ix,loc for scalarindexing and slicing work exactly the same. Indexing on other index typesare preserved (and positional fallback for[],ix), with the exception,that floating point slicing on indexes on nonFloat64Index will raise aTypeError, e.g.Series(range(5))[3.5:4.5] (GH263,:issue:5375)
Make Categorical repr nicer (GH4368)
Remove deprecatedFactor (GH3650)
Remove deprecatedset_printoptions/reset_printoptions (:issue:3046)
Remove deprecated_verbose_info (GH3215)
Begin removing methods that don’t make sense onGroupBy objects(GH4887).
Remove deprecatedread_clipboard/to_clipboard/ExcelFile/ExcelWriterfrompandas.io.parsers (GH3717)
All non-Index NDFrames (Series,DataFrame,Panel,Panel4D,SparsePanel, etc.), now support the entire set of arithmetic operatorsand arithmetic flex methods (add, sub, mul, etc.).SparsePanel does notsupportpow ormod with non-scalars. (GH3765)
Arithmetic func factories are now passed real names (suitable for usingwith super) (GH5240)
Provide numpy compatibility with 1.7 for a calling convention likenp.prod(pandas_object) as numpy call with additional keyword args(GH4435)
Provide __dir__ method (and local context) for tab completion / removeipython completers code (GH4501)
Support non-unique axes in a Panel via indexing operations (GH4960)
.truncate will raise aValueError if invalid before and aftersdates are given (GH5242)
Timestamp now supportsnow/today/utcnow class methods(GH5339)
default fordisplay.max_seq_len is now 100 rather thenNone. This activatestruncated display (”...”) of long sequences in various places. (GH3391)
All division withNDFrame - likes is now truedivision, regardlessof the future import. You can use// andfloordiv to do integerdivision.

In [3]:arr=np.array([1,2,3,4])In [4]:arr2=np.array([5,3,2,1])In [5]:arr/arr2Out[5]:array([0,0,1,4])In [6]:pd.Series(arr)/pd.Series(arr2)# no future import requiredOut[6]:0    0.2000001    0.6666672    1.5000003    4.000000dtype: float64

raise/warnSettingWithCopyError/Warning exception/warning when setting of acopy thru chained assignment is detected, settable via optionmode.chained_assignment
test the list ofNA values in the csv parser. addN/A,#NA as independent defaultna values (GH5521)
The refactoring involving``Series`` deriving fromNDFrame breaksrpy2<=2.3.8. an Issuehas been opened against rpy2 and a workaround is detailed inGH5698. Thanks @JanSchulz.
Series.argmin andSeries.argmax are now aliased toSeries.idxmin andSeries.idxmax.These return theindex of the min or max element respectively. Prior to 0.13.0 these would returnthe position of the min / max element (GH6214)

Internal Refactoring¶

In 0.13.0 there is a major refactor primarily to subclassSeries fromNDFrame, which is the base class currently forDataFrame andPanel,to unify methods and behaviors. Series formerly subclassed directly fromndarray. (GH4080,GH3862,GH816)SeeInternal Refactoring

Refactor of series.py/frame.py/panel.py to move common code to generic.py

added_setup_axes to created generic NDFrame structures
moved methods
from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop
__iter__,keys,__contains__,__len__,__neg__,__invert__
convert_objects,as_blocks,as_matrix,values
__getstate__,__setstate__ (compat remains in frame/panel)
__getattr__,__setattr__
_indexed_same,reindex_like,align,where,mask
fillna,replace (Series replace is now consistent withDataFrame)
filter (also added axis argument to selectively filter on a differentaxis)
reindex,reindex_axis,take
truncate (moved to become part ofNDFrame)
isnull/notnull now available onNDFrame objects

These are API changes which makePanel more consistent withDataFrame

swapaxes on aPanel with the same axes specified now return a copy
support attribute access for setting
filter supports same API as originalDataFrame filter
fillna refactored tocore/generic.py, while > 3ndim isNotImplemented

Series now inherits fromNDFrame rather than directly fromndarray.There are several minor changes that affect the API.

numpy functions that do not support the array interface will now returnndarrays rather than series, e.g.np.diff,np.ones_like,np.where
Series(0.5) would previously return the scalar0.5, this is nolonger supported
TimeSeries is now an alias forSeries. the propertyis_time_series can be used to distinguish (if desired)

Refactor of Sparse objects to use BlockManager

Created a new block type in internals,SparseBlock, which can holdmulti-dtypes and is non-consolidatable.SparseSeries andSparseDataFrame now inherit more methods from there hierarchy(Series/DataFrame), and no longer inherit fromSparseArray (whichinstead is the object of theSparseBlock)
Sparse suite now supports integration with non-sparse data. Non-floatsparse data is supportable (partially implemented)
Operations on sparse structures within DataFrames should preservesparseness, merging type operations will convert to dense (and back tosparse), so might be somewhat inefficient
enable setitem onSparseSeries for boolean/integer/slices
SparsePanels implementation is unchanged (e.g. not using BlockManager,needs work)

addedftypes method to Series/DataFame, similar todtypes, butindicates if the underlying is sparse/dense (as well as the dtype)
AllNDFrame objects now have a_prop_attributes, which can be usedto indicate various values to propagate to a new object from an existing(e.g. name inSeries will follow more automatically now)
Internal type checking is now done via a suite of generated classes,allowingisinstance(value,klass) without having to directly import theklass, courtesy of @jtratner
Bug in Series update where the parent frame is not updating its cache basedon changes (GH4080,GH5216) or types (GH3217), fillna(GH3386)
Indexing with dtype conversions fixed (GH4463,GH4204)
RefactorSeries.reindex to core/generic.py (GH4604,GH4618), allowmethod= in reindexing on a Series to work
Series.copy no longer accepts theorder parameter and is nowconsistent withNDFrame copy
Refactorrename methods to core/generic.py; fixesSeries.rename for(GH4605), and addsrename with the same signature forPanel
Series (for index) / Panel (for items) now as attribute access to itselements (GH1903)
Refactorclip methods to core/generic.py (GH4798)
Refactor of_get_numeric_data/_get_bool_data to core/generic.py,allowing Series/Panel functionality
Refactor of Series arithmetic with time-like objects(datetime/timedelta/time etc.) into a separate, cleaned up wrapper class.(GH4613)
Complex compat forSeries withndarray. (GH4819)
Removed unnecessaryrwproperty from codebase in favor of builtinproperty. (GH4843)
Refactor object level numeric methods (mean/sum/min/max...) from objectlevel modules tocore/generic.py (GH4435).
Refactor cum objects to core/generic.py (GH4435), note that thesehave a more numpy-like function signature.
read_html() now usesTextParser to parse HTML data frombs4/lxml (GH4770).
Removed thekeep_internal keyword parameter inpandas/core/groupby.py because it wasn’t being used (GH5102).
BaseDateOffsets are no longer all instantiated on importing pandas,instead they are generated and cached on the fly. The internalrepresentation and handling of DateOffsets has also been clarified.(GH5189, relatedGH5004)
MultiIndex constructor now validates that passed levels and labels arecompatible. (GH5213,GH5214)
Unitydropna for Series/DataFrame signature (GH5250),tests fromGH5234, courtesy of @rockg
Rewrite assert_almost_equal() in cython for performance (GH4398)
Added an internal_update_inplace method to facilitate updatingNDFrame wrappers on inplace ops (only is for convenience of caller,doesn’t actually prevent copies). (GH5247)

Bug Fixes¶

HDFStore
- raising an invalidTypeError rather thanValueError whenappending with a different block ordering (GH4096)
- read_hdf was not respecting as passedmode (GH4504)
- appending a 0-len table will work correctly (GH4273)
- to_hdf was raising when passing both argumentsappend andtable (GH4584)
- reading from a store with duplicate columns across dtypes would raise(GH4767)
- Fixed a bug whereValueError wasn’t correctly raised when columnnames weren’t strings (GH4956)
- A zero length series written in Fixed format not deserializing properly.(GH4708)
- Fixed decoding perf issue on pyt3 (GH5441)
- Validate levels in a multi-index before storing (GH5527)
- Correctly handledata_columns with a Panel (GH5717)
Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexErrorexception while trying to access trans[pos + 1] (GH4496)
Theby argument now works correctly with thelayout argument(GH4102,GH4014) in*.hist plotting methods
Fixed bug inPeriodIndex.map where usingstr would return the strrepresentation of the index (GH4136)
Fixed test failuretest_time_series_plot_color_with_empty_kwargs whenusing custom matplotlib default colors (GH4345)
Fix running of stata IO tests. Now uses temporary files to write(GH4353)
Fixed an issue whereDataFrame.sum was slower thanDataFrame.meanfor integer valued frames (GH4365)
read_html tests now work with Python 2.6 (GH4351)
Fixed bug wherenetwork testing was throwingNameError because alocal variable was undefined (GH4381)
Into_json, raise if a passedorient would cause loss of databecause of a duplicate index (GH4359)
Into_json, fix date handling so milliseconds are the default timestampas the docstring says (GH4362).
as_index is no longer ignored when doing groupby apply (GH4648,GH3417)
JSON NaT handling fixed, NaTs are now serialized tonull (GH4498)
Fixed JSON handling of escapable characters in JSON object keys(GH4593)
Fixed passingkeep_default_na=False whenna_values=None(GH4318)
Fixed bug withvalues raising an error on a DataFrame with duplicatecolumns and mixed dtypes, surfaced in (GH4377)
Fixed bug with duplicate columns and type conversion inread_json whenorient='split' (GH4377)
Fixed JSON bug where locales with decimal separators other than ‘.’ threwexceptions when encoding / decoding certain values. (GH4918)
Fix.iat indexing with aPeriodIndex (GH4390)
Fixed an issue wherePeriodIndex joining with self was returning a newinstance rather than the same instance (GH4379); also adds a testfor this for the other index types
Fixed a bug with all the dtypes being converted to object when using theCSV cparser with the usecols parameter (GH3192)
Fix an issue in merging blocks where the resulting DataFrame had partiallyset _ref_locs (GH4403)
Fixed an issue where hist subplots were being overwritten when they werecalled using the top level matplotlib API (GH4408)
Fixed a bug where callingSeries.astype(str) would truncate the string(GH4405,GH4437)
Fixed a py3 compat issue where bytes were being repr’d as tuples(GH4455)
Fixed Panel attribute naming conflict if item is named ‘a’(GH3440)
Fixed an issue where duplicate indexes were raising when plotting(GH4486)
Fixed an issue where cumsum and cumprod didn’t work with bool dtypes(GH4170,GH4440)
Fixed Panel slicing issued inxs that was returning an incorrect dimmedobject (GH4016)
Fix resampling bug where custom reduce function not used if only one group(GH3849,GH4494)
Fixed Panel assignment with a transposed frame (GH3830)
Raise on set indexing with a Panel and a Panel as a value which needsalignment (GH3777)
frozenset objects now raise in theSeries constructor (GH4482,GH4480)
Fixed issue with sorting a duplicate multi-index that has multiple dtypes(GH4516)
Fixed bug inDataFrame.set_values which was causing name attributes tobe lost when expanding the index. (GH3742,GH4039)
Fixed issue where individualnames,levels andlabels could beset onMultiIndex without validation (GH3714,GH4039)
Fixed (GH3334) in pivot_table. Margins did not compute if values isthe index.
Fix bug in having a rhs ofnp.timedelta64 ornp.offsets.DateOffsetwhen operating with datetimes (GH4532)
Fix arithmetic with series/datetimeindex andnp.timedelta64 not workingthe same (GH4134) and buggy timedelta in numpy 1.6 (GH4135)
Fix bug inpd.read_clipboard on windows with PY3 (GH4561); notdecoding properly
tslib.get_period_field() andtslib.get_period_field_arr() now raiseif code argument out of range (GH4519,GH4520)
Fix boolean indexing on an empty series loses index names (GH4235),infer_dtype works with empty arrays.
Fix reindexing with multiple axes; if an axes match was not replacing thecurrent axes, leading to a possible lazay frequency inference issue(GH3317)
Fixed issue whereDataFrame.apply was reraising exceptions incorrectly(causing the original stack trace to be truncated).
Fix selection withix/loc and non_unique selectors (GH4619)
Fix assignment with iloc/loc involving a dtype change in an existing column(GH4312,GH5702) have internal setitem_with_indexer in core/indexingto use Block.setitem
Fixed bug where thousands operator was not handled correctly for floatingpoint numbers in csv_import (GH4322)
Fix an issue with CacheableOffset not properly being used by manyDateOffset; this prevented the DateOffset from being cached (GH4609)
Fix boolean comparison with a DataFrame on the lhs, and a list/tuple on therhs (GH4576)
Fix error/dtype conversion with setitem ofNone onSeries/DataFrame(GH4667)
Fix decoding based on a passed in non-default encoding inpd.read_stata(GH4626)
FixDataFrame.from_records with a plain-vanillandarray.(GH4727)
Fix some inconsistencies withIndex.rename andMultiIndex.rename,etc. (GH4718,GH4628)
Bug in usingiloc/loc with a cross-sectional and duplicate indicies(GH4726)
Bug with usingQUOTE_NONE withto_csv causingException.(GH4328)
Bug with Series indexing not raising an error when the right-hand-side hasan incorrect length (GH2702)
Bug in multi-indexing with a partial string selection as one part of aMultIndex (GH4758)
Bug with reindexing on the index with a non-unique index will now raiseValueError (GH4746)
Bug in setting withloc/ix a single indexer with a multi-index axis anda numpy array, related to (GH3777)
Bug in concatenation with duplicate columns across dtypes not merging withaxis=0 (GH4771,GH4975)
Bug iniloc with a slice index failing (GH4771)
Incorrect error message with no colspecs or width inread_fwf.(GH4774)
Fix bugs in indexing in a Series with a duplicate index (GH4548,GH4550)
Fixed bug with reading compressed files withread_fwf in Python 3.(GH3963)
Fixed an issue with a duplicate index and assignment with a dtype change(GH4686)
Fixed bug with reading compressed files in asbytes rather thanstrin Python 3. Simplifies bytes-producing file-handling in Python 3(GH3963,GH4785).
Fixed an issue related to ticklocs/ticklabels with log scale bar plotsacross different versions of matplotlib (GH4789)
Suppressed DeprecationWarning associated with internal calls issued byrepr() (GH4391)
Fixed an issue with a duplicate index and duplicate selector with.loc(GH4825)
Fixed an issue withDataFrame.sort_index where, when sorting by asingle column and passing a list forascending, the argument forascending was being interpreted asTrue (GH4839,GH4846)
FixedPanel.tshift not working. Addedfreq support toPanel.shift(GH4853)
Fix an issue in TextFileReader w/ Python engine (i.e. PythonParser)with thousands != ”,” (GH4596)
Bug in getitem with a duplicate index when using where (GH4879)
Fix Type inference code coerces float column into datetime (GH4601)
Fixed_ensure_numeric does not check for complex numbers(GH4902)
Fixed a bug inSeries.hist where two figures were being created whentheby argument was passed (GH4112,GH4113).
Fixed a bug inconvert_objects for > 2 ndims (GH4937)
Fixed a bug in DataFrame/Panel cache insertion and subsequent indexing(GH4939,GH5424)
Fixed string methods forFrozenNDArray andFrozenList(GH4929)
Fixed a bug with setting invalid or out-of-range values in indexingenlargement scenarios (GH4940)
Tests for fillna on empty Series (GH4346), thanks @immerrr
Fixedcopy() to shallow copy axes/indices as well and thereby keepseparate metadata. (GH4202,GH4830)
Fixed skiprows option in Python parser for read_csv (GH4382)
Fixed bug preventingcut from working withnp.inf levels withoutexplicitly passing labels (GH3415)
Fixed wrong check for overlapping inDatetimeIndex.union(GH4564)
Fixed conflict between thousands separator and date parser in csv_parser(GH4678)
Fix appending when dtypes are not the same (error showing mixingfloat/np.datetime64) (GH4993)
Fix repr for DateOffset. No longer show duplicate entries in kwds.Removed unused offset fields. (GH4638)
Fixed wrong index name during read_csv if using usecols. Applies to cparser only. (GH4201)
Timestamp objects can now appear in the left hand side of a comparisonoperation with aSeries orDataFrame object (GH4982).
Fix a bug when indexing withnp.nan viailoc/loc (GH5016)
Fixed a bug where low memory c parser could create different types indifferent chunks of the same file. Now coerces to numerical type or raiseswarning. (GH3866)
Fix a bug where reshaping aSeries to its own shape raisedTypeError (GH4554) and other reshaping issues.
Bug in setting withix/loc and a mixed int/string index (GH4544)
Make sure series-series boolean comparisons are label based (GH4947)
Bug in multi-level indexing with a Timestamp partial indexer(GH4294)
Tests/fix for multi-index construction of an all-nan frame (GH4078)
Fixed a bug whereread_html() wasn’t correctly inferringvalues of tables with commas (GH5029)
Fixed a bug whereread_html() wasn’t providing a stableordering of returned tables (GH4770,GH5029).
Fixed a bug whereread_html() was incorrectly parsing whenpassedindex_col=0 (GH5066).
Fixed a bug whereread_html() was incorrectly inferring thetype of headers (GH5048).
Fixed a bug whereDatetimeIndex joins withPeriodIndex caused astack overflow (GH3899).
Fixed a bug wheregroupby objects didn’t allow plots (GH5102).
Fixed a bug wheregroupby objects weren’t tab-completing column names(GH5102).
Fixed a bug wheregroupby.plot() and friends were duplicating figuresmultiple times (GH5102).
Provide automatic conversion ofobject dtypes on fillna, related(GH5103)
Fixed a bug where default options were being overwritten in the optionparser cleaning (GH5121).
Treat a list/ndarray identically foriloc indexing with list-like(GH5006)
FixMultiIndex.get_level_values() with missing values (GH5074)
Fix bound checking for Timestamp() with datetime64 input (GH4065)
Fix a bug whereTestReadHtml wasn’t calling the correctread_html()function (GH5150).
Fix a bug withNDFrame.replace() which made replacement appear asthough it was (incorrectly) using regular expressions (GH5143).
Fix better error message for to_datetime (GH4928)
Made sure different locales are tested on travis-ci (GH4918). Alsoadds a couple of utilities for getting locales and setting locales with acontext manager.
Fixed segfault onisnull(MultiIndex) (now raises an error instead)(GH5123,GH5125)
Allow duplicate indices when performing operations that align(GH5185,GH5639)
Compound dtypes in a constructor raiseNotImplementedError(GH5191)
Bug in comparing duplicate frames (GH4421) related
Bug in describe on duplicate frames
Bug into_datetime with a format andcoerce=True not raising(GH5195)
Bug inloc setting with multiple indexers and a rhs of a Series thatneeds broadcasting (GH5206)
Fixed bug where inplace setting of levels or labels onMultiIndex wouldnot clear cachedvalues property and therefore return wrongvalues.(GH5215)
Fixed bug where filtering a grouped DataFrame or Series did not maintainthe original ordering (GH4621).
FixedPeriod with a business date freq to always roll-forward if on anon-business date. (GH5203)
Fixed bug in Excel writers where frames with duplicate column names weren’twritten correctly. (GH5235)
Fixed issue withdrop and a non-unique index on Series (GH5248)
Fixed seg fault in C parser caused by passing more names than columns inthe file. (GH5156)
FixSeries.isin with date/time-like dtypes (GH5021)
C and Python Parser can now handle the more common multi-index columnformat which doesn’t have a row for index names (GH4702)
Bug when trying to use an out-of-bounds date as an object dtype(GH5312)
Bug when trying to display an embedded PandasObject (GH5324)
Allows operating of Timestamps to return a datetime if the result is out-of-boundsrelated (GH5312)
Fix return value/type signature ofinitObjToJSON() to be compatiblewith numpy’simport_array() (GH5334,GH5326)
Bug when renaming then set_index on a DataFrame (GH5344)
Test suite no longer leaves around temporary files when testing graphics. (GH5347)(thanks for catching this @yarikoptic!)
Fixed html tests on win32. (GH4580)
Make sure thathead/tail areiloc based, (GH5370)
Fixed bug forPeriodIndex string representation if there are 1 or 2elements. (GH5372)
The GroupBy methodstransform andfilter can be used on Seriesand DataFrames that have repeated (non-unique) indices. (GH4620)
Fix empty series not printing name in repr (GH4651)
Make tests create temp files in temp directory by default. (GH5419)
pd.to_timedelta of a scalar returns a scalar (GH5410)
pd.to_timedelta acceptsNaN andNaT, returningNaT instead of raising (GH5437)
performance improvements inisnull on larger size pandas objects
Fixed various setitem with 1d ndarray that does not have a matchinglength to the indexer (GH5508)
Bug in getitem with a multi-index andiloc (GH5528)
Bug in delitem on a Series (GH5542)
Bug fix in apply when using custom function and objects are not mutated (GH5545)
Bug in selecting from a non-unique index withloc (GH5553)
Bug in groupby returning non-consistent types when user function returns aNone, (GH5592)
Work around regression in numpy 1.7.0 which erroneously raises IndexError fromndarray.item (GH5666)
Bug in repeated indexing of object with resultant non-unique index (GH5678)
Bug in fillna with Series and a passed series/dict (GH5703)
Bug in groupby transform with a datetime-like grouper (GH5712)
Bug in multi-index selection in PY3 when using certain keys (GH5725)
Row-wise concat of differing dtypes failing in certain cases (GH5754)

pandas 0.12.0¶

Release date: 2013-07-24

New Features¶

pd.read_html() can now parse HTML strings, files or urls and returns alist ofDataFrame s courtesy of @cpcloud. (GH3477,GH3605,GH3606)
Support for reading Amazon S3 files. (GH3504)
Added module for reading and writing JSON strings/files: pandas.io.jsonincludesto_json DataFrame/Series method, and aread_json top-level readervarious issues (GH1226,GH3804,GH3876,GH3867,GH1305)
Added module for reading and writing Stata files: pandas.io.stata (GH1512)includesto_stata DataFrame method, and aread_stata top-level reader
Added support for writing into_csv and reading inread_csv,multi-index columns. Theheader option inread_csv now accepts alist of the rows from which to read the index. Added the option,tupleize_cols to provide compatibility for the pre 0.12 behavior ofwriting and reading multi-index columns via a list of tuples. The default in0.12 is to write lists of tuples andnot interpret list of tuples as amulti-index column.Note: The default value will change in 0.12 to make the defaultto write andread multi-index columns in the new format. (GH3571,GH1651,GH3141)
Add iterator toSeries.str (GH3638)
pd.set_option() now allows N option, value pairs (GH3667).
Added keyword parameters for different types of scatter_matrix subplots
Afilter method on grouped Series or DataFrames returns a subset ofthe original (GH3680,GH919)
Access to historical Google Finance data in pandas.io.data (GH3814)
DataFrame plotting methods can sample column colors from a Matplotlibcolormap via thecolormap keyword. (GH3860)

Improvements to existing features¶

Fixed various issues with internal pprinting code, the repr() for various objectsincluding TimeStamp and Index now produces valid python code strings andcan be used to recreate the object, (GH3038,GH3379,GH3251,GH3460)
convert_objects now accepts acopy parameter (defaults toTrue)
HDFStore
- will retain index attributes (freq,tz,name) on recreation (GH3499,:issue:4098)
- will warn with aAttributeConflictWarning if you are attempting to appendan index with a different frequency than the existing, or attemptingto append an index with a different name than the existing
- support datelike columns with a timezone as data_columns (GH2852)
- table writing performance improvements.
- support python3 (viaPyTables3.0.0) (GH3750)
Add modulo operator to Series, DataFrame
Adddate method to DatetimeIndex
Adddropna argument to pivot_table (:issue:3820)
Simplified the API and added a describe method to Categorical
melt now accepts the optional parametersvar_name andvalue_nameto specify custom column names of the returned DataFrame (GH3649),thanks @hoechenberger. Ifvar_name is not specified anddataframe.columns.nameis not None, then this will be used as thevar_name (GH4144).Also support for MultiIndex columns.
clipboard functions use pyperclip (no dependencies on Windows, alternativedependencies offered for Linux) (GH3837).
Plotting functions now raise aTypeError before trying to plot anythingif the associated objects have have a dtype ofobject (GH1818,GH3572,GH3911,GH3912), but they will try to convert objectarrays to numeric arrays if possible so that you can still plot, for example, anobject array with floats. This happens before any drawing takes place whicheliminates any spurious plots from showing up.
Added Faq section on repr display options, to help users customize their setup.
where operations that result in block splitting are much faster (GH3733)
Series and DataFrame hist methods now take afigsize argument (GH3834)
DatetimeIndexes no longer try to convert mixed-integer indexes during joinoperations (GH3877)
Addunit keyword toTimestamp andto_datetime to enable passing ofintegers or floats that are in an epoch unit ofD,s,ms,us,ns, thanks @mtkini (GH3969)(e.g. unix timestamps or epochs, with fractional seconds allowed) (GH3540)
DataFrame corr method (spearman) is now cythonized.
Improvednetwork test decorator to catchIOError (and thereforeURLError as well). Addedwith_connectivity_check decorator to allowexplicitly checking a website as a proxy for seeing if there is networkconnectivity. Plus, newoptional_args decorator factory for decorators.(GH3910,GH3914)
read_csv will now throw a more informative error message when a filecontains no columns, e.g., all newline characters
Addedlayout keyword to DataFrame.hist() for more customizable layout (GH4050)
Timestamp.min and Timestamp.max now represent valid Timestamp instances insteadof the default datetime.min and datetime.max (respectively), thanks @SleepingPills
read_html now raises when no tables are found and BeautifulSoup==4.2.0is detected (GH4214)

API Changes¶

HDFStore
- When removing an object,remove(key) raisesKeyError if the key is not a valid store object.
- raise aTypeError on passingwhere orcolumnsto select with a Storer; these are invalid parameters at this time (GH4189)
- can now specify anencoding option toappend/putto enable alternate encodings (GH3750)
- enable support foriterator/chunksize withread_hdf
The repr() for (Multi)Index now obeys display.max_seq_items ratherthen numpy threshold print options. (GH3426,GH3466)
Added mangle_dupe_cols option to read_table/csv, allowing usersto control legacy behaviour re dupe cols (A, A.1, A.2 vs A, A ) (GH3468)Note: The default value will change in 0.12 to the “no mangle” behaviour,If your code relies on this behaviour, explicitly specify mangle_dupe_cols=Truein your calls.
Do not allow astypes ondatetime64[ns] except toobject, andtimedelta64[ns] toobject/int (GH3425)
The behavior ofdatetime64 dtypes has changed with respect to certainso-called reduction operations (GH3726). The following operations nowraise aTypeError when performed on aSeries and return anemptySeries when performed on aDataFrame similar to performing theseoperations on, for example, aDataFrame ofslice objects:- sum, prod, mean, std, var, skew, kurt, corr, and cov
Do not allow datetimelike/timedeltalike creation except with valid types(e.g. cannot passdatetime64[ms]) (GH3423)
Addsqueeze keyword togroupby to allow reduction fromDataFrame -> Series if groups are unique. Regression from 0.10.1,partial revert on (GH2893) with (GH3596)
Raise oniloc when boolean indexing with a label based indexer maske.g. a boolean Series, even with integer labels, will raise. Sinceilocis purely positional based, the labels on the Series are not alignable (GH3631)
Theraise_on_error option to plotting methods is obviated byGH3572,so it is removed. Plots now always raise when data cannot be plotted or theobject being plotted has a dtype ofobject.
DataFrame.interpolate() is now deprecated. Please useDataFrame.fillna() andDataFrame.replace() instead (GH3582,GH3675,GH3676).
themethod andaxis arguments ofDataFrame.replace() aredeprecated
DataFrame.replace ‘sinfer_types parameter is removed and nowperforms conversion by default. (GH3907)
Deprecated display.height, display.width is now only a formatting optiondoes not control triggering of summary, similar to < 0.11.0.
Add the keywordallow_duplicates toDataFrame.insert to allow a duplicate columnto be inserted ifTrue, default isFalse (same as prior to 0.12) (GH3679)
io API changes
- addedpandas.io.api for i/o imports
- removedExcel support topandas.io.excel
- added top-levelpd.read_sql andto_sql DataFrame methods
- removedclipboard support topandas.io.clipboard
- replace top-level and instance methodssave andload withtop-levelread_pickle andto_pickle instance method,save andload will give deprecation warning.
themethod andaxis arguments ofDataFrame.replace() aredeprecated
set FutureWarning to require data_source, and to replace year/month withexpiry date in pandas.io options. This is in preparation to add optionsdata from Google (GH3822)
themethod andaxis arguments ofDataFrame.replace() aredeprecated
Implement__nonzero__ forNDFrame objects (GH3691,GH3696)
as_matrix with mixed signed and unsigned dtypes will result in 2 x the lcd of the unsignedas an int, maxing withint64, to avoid precision issues (GH3733)
na_values in a list provided toread_csv/read_excel will match string and numeric versionse.g.na_values=['99'] will match 99 whether the column ends up being int, float, or string (GH3611)
read_html now defaults toNone when reading, and falls back onbs4 +html5lib when lxml fails to parse. a list of parsers to tryuntil success is also valid
more consistency in the to_datetime return types (give string/array of string inputs) (GH3888)
The internalpandas class hierarchy has changed (slightly). ThepreviousPandasObject now is calledPandasContainer and a newPandasObject has become the baseclass forPandasContainer as wellasIndex,Categorical,GroupBy,SparseList, andSparseArray (+ their base classes). Currently,PandasObjectprovides string methods (fromStringMixin). (GH4090,GH4092)
NewStringMixin that, given a__unicode__ method, gets python 2 andpython 3 compatible string methods (__str__,__bytes__, and__repr__). Plus string safety throughout. Now employed in many placesthroughout the pandas library. (GH4090,GH4092)

Experimental Features¶

Added experimentalCustomBusinessDay class to supportDateOffsetswith custom holiday calendars and custom weekmasks. (GH2301)

Bug Fixes¶

Fixed an esoteric excel reading bug, xlrd>= 0.9.0 now required for excelsupport. Should provide python3 support (for reading) which has beenlacking. (GH3164)
Disallow Series constructor called with MultiIndex which caused segfault (GH4187)
Allow unioning of date ranges sharing a timezone (GH3491)
Fix to_csv issue when having a large number of rows andNaT in somecolumns (GH3437)
.loc was not raising when passed an integer list (GH3449)
Unordered time series selection was misbehaving when using label slicing (GH3448)
Fix sorting in a frame with a list of columns which contains datetime64[ns] dtypes (GH3461)
DataFrames fetched via FRED now handle ‘.’ as a NaN. (GH3469)
Fix regression in a DataFrame apply with axis=1, objects were not being converted backto base dtypes correctly (GH3480)
Fix issue when storing uint dtypes in an HDFStore. (GH3493)
Non-unique index support clarified (GH3468)
- Addressed handling of dupe columns in df.to_csv new and old (GH3454,GH3457)
- Fix assigning a new index to a duplicate index in a DataFrame would fail (GH3468)
- Fix construction of a DataFrame with a duplicate index
- ref_locs support to allow duplicative indices across dtypes,allows iget support to always find the index (even across dtypes) (GH2194)
- applymap on a DataFrame with a non-unique index now works(removed warning) (GH2786), and fix (GH3230)
- Fix to_csv to handle non-unique columns (GH3495)
- Duplicate indexes with getitem will return items in the correct order (GH3455,GH3457)and handle missing elements like unique indices (GH3561)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602)
- Non-unique indexing with a slice vialoc and friends fixed (GH3659)
- Allow insert/delete to non-unique columns (GH3679)
- Extendreindex to correctly deal with non-unique indices (GH3679)
- DataFrame.itertuples() now works with frames with duplicate columnnames (GH3873)
- Bug in non-unique indexing viailoc (GH4017); addedtakeable argument toreindex for location-based taking
- Allow non-unique indexing in series via.ix/.loc and__getitem__ (GH4246)
- Fixed non-unique indexing memory allocation issue with.ix/.loc (GH4280)
Fixed bug in groupby with empty series referencing a variable before assignment. (GH3510)
Allow index name to be used in groupby for non MultiIndex (GH4014)
Fixed bug in mixed-frame assignment with aligned series (GH3492)
Fixed bug in selecting month/quarter/year from a series would not select the time elementon the last day (GH3546)
Fixed a couple of MultiIndex rendering bugs in df.to_html() (GH3547,GH3553)
Properly convert np.datetime64 objects in a Series (GH3416)
Raise aTypeError on invalid datetime/timedelta operationse.g. add datetimes, multiple timedelta x datetime
Fix.diff on datelike and timedelta operations (GH3100)
combine_first not returning the same dtype in cases where it can (GH3552)
Fixed bug withPanel.transpose argument aliases (GH3556)
Fixed platform bug inPeriodIndex.take (GH3579)
Fixed bud in incorrect conversion of datetime64[ns] incombine_first (GH3593)
Fixed bug in reset_index withNaN in a multi-index (GH3586)
fillna methods now raise aTypeError when thevalue parameteris alist ortuple.
Fixed bug where a time-series was being selected in preference to an actual column namein a frame (GH3594)
Make secondary_y work properly for bar plots (GH3598)
Fix modulo and integer division on Series,DataFrames to act similary tofloat dtypes to returnnp.nan ornp.inf as appropriate (GH3590)
Fix incorrect dtype on groupby withas_index=False (GH3610)
Fixread_csv/read_excel to correctly encode identical na_values, e.g.na_values=[-999.0,-999]was failing (GH3611)
Disable HTML output in qtconsole again. (GH3657)
Reworked the new repr display logic, which users found confusing. (GH3663)
Fix indexing issue in ndim >= 3 withiloc (GH3617)
Correctly parse date columns with embedded (nan/NaT) into datetime64[ns] dtype inread_csvwhenparse_dates is specified (GH3062)
Fix not consolidating before to_csv (GH3624)
Fix alignment issue when setitem in a DataFrame with a piece of a DataFrame (GH3626) ora mixed DataFrame and a Series (GH3668)
Fix plotting of unordered DatetimeIndex (GH3601)
sql.write_frame failing when writing a single column to sqlite (GH3628),thanks to @stonebig
Fix pivoting withnan in the index (GH3558)
Fix running of bs4 tests when it is not installed (GH3605)
Fix parsing of html table (GH3606)
read_html() now only allows a single backend:html5lib (GH3616)
convert_objects withconvert_dates='coerce' was parsing some single-letter strings into today’s date
DataFrame.from_records did not accept empty recarrays (GH3682)
DataFrame.to_csv will succeed with the deprecated optionnanRep, @tdsmith
DataFrame.to_html andDataFrame.to_latex now accept a path fortheir first argument (GH3702)
Fix file tokenization error with r delimiter and quoted fields (GH3453)
Groupby transform with item-by-item not upcasting correctly (GH3740)
Incorrectly read a HDFStore multi-index Frame with a column specification (GH3748)
read_html now correctly skips tests (GH3741)
PandasObjects raise TypeError when trying to hash (GH3882)
Fix incorrect arguments passed to concat that are not list-like (e.g. concat(df1,df2)) (GH3481)
Correctly parse when passed thedtype=str (or other variable-len string dtypes)inread_csv (GH3795)
Fix index name not propagating when usingloc/ix (GH3880)
Fix groupby when applying a custom function resulting in a returned DataFrame wasnot converting dtypes (GH3911)
Fixed a bug whereDataFrame.replace with a compiled regular expressionin theto_replace argument wasn’t working (GH3907)
Fixed__truediv__ in Python 2.7 withnumexpr installed to actually do true division when dividingtwo integer arrays with at least 10000 cells total (GH3764)
Indexing with a string with seconds resolution not selecting from a time index (GH3925)
csv parsers would loop infinitely ifiterator=True but nochunksize wasspecified (GH3967), python parser failing withchunksize=1
Fix index name not propagating when usingshift
Fixed dropna=False being ignored with multi-index stack (GH3997)
Fixed flattening of columns when renaming MultiIndex columns DataFrame (GH4004)
FixSeries.clip for datetime series. NA/NaN threshold values will now throw ValueError (GH3996)
Fixed insertion issue into DataFrame, after rename (GH4032)
Fixed testing issue where too many sockets where open thus leading to aconnection reset issue (GH3982,GH3985,GH4028,GH4054)
Fixed failing tests in test_yahoo, test_google where symbols were notretrieved but were being accessed (GH3982,GH3985,GH4028,GH4054)
Series.hist will now take the figure from the current environment ifone is not passed
Fixed bug where a 1xN DataFrame would barf on a 1xN mask (GH4071)
Fixed running oftox under python3 where the pickle import was gettingrewritten in an incompatible way (GH4062,GH4063)
Fixed bug where sharex and sharey were not being passed to grouped_hist(GH4089)
Fix bug whereHDFStore will fail to append because of a different blockordering on-disk (GH4096)
Better error messages on inserting incompatible columns to a frame (GH4107)
Fixed bug inDataFrame.replace where a nested dict wasn’t beingiterated over when regex=False (GH4115)
Fixed bug inconvert_objects(convert_numeric=True) where a mixed numeric andobject Series/Frame was not converting properly (GH4119)
Fixed bugs in multi-index selection with column multi-index and duplicates(GH4145,GH4146)
Fixed bug in the parsing of microseconds when using theformatargument into_datetime (GH4152)
Fixed bug inPandasAutoDateLocator whereinvert_xaxis triggeredincorrectlyMilliSecondLocator (GH3990)
Fixed bug inSeries.where where broadcasting a single element input vectorto the length of the series resulted in multiplying the valueinside the input (GH4192)
Fixed bug in plotting that wasn’t raising on invalid colormap formatplotlib 1.1.1 (GH4215)
Fixed the legend displaying inDataFrame.plot(kind='kde') (GH4216)
Fixed bug where Index slices weren’t carrying the name attribute(GH4226)
Fixed bug in initializingDatetimeIndex with an array of stringsin a certain time zone (GH4229)
Fixed bug where html5lib wasn’t being properly skipped (GH4265)
Fixed bug where get_data_famafrench wasn’t using the correct file edges(GH4281)

pandas 0.11.0¶

Release date: 2013-04-22

New Features¶

New documentation section,10MinutestoPandas
New documentation section,Cookbook
Allow mixed dtypes (e.gfloat32/float64/int32/int16/int8) to coexist in DataFrames and propagate in operations
Add function to pandas.io.data for retrieving stock index components from Yahoo! finance (GH2795)
Support slicing with time objects (GH2681)
Added.iloc attribute, to support strict integer based indexing, analogous to.ix (GH2922)
Added.loc attribute, to support strict label based indexing, analogous to.ix (GH3053)
Added.iat attribute, to support fast scalar access via integers (replacesiget_value/iset_value)
Added.at attribute, to support fast scalar access via labels (replacesget_value/set_value)
Moved functionality fromirow,icol,iget_value/iset_value to.iloc indexer (via_ixs methods in each object)
Added support for expression evaluation using thenumexpr library
Addedconvert=boolean totake routines to translate negative indices to positive, defaults to True
Added to_series() method to indices, to facilitate the creation of indexers (GH3275)

Improvements to existing features¶

Improved performance of df.to_csv() by up to 10x in some cases. (GH3059)
addedblocks attribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFrames
added keywordconvert_numeric toconvert_objects() to try to convert object dtypes to numeric types (default is False)
convert_dates inconvert_objects can now becoerce which will returna datetime64[ns] dtype with non-convertibles set asNaT; will preserve an all-nan object(e.g. strings), default is True (to perform soft-conversion
Series print output now includes the dtype by default
Optimize internal reindexing routines (GH2819,GH2867)
describe_option() now reports the default and current value of options.
Addformat option topandas.to_datetime with faster conversion of strings that can be parsed with datetime.strptime
Addaxes property toSeries for compatibility
Addxs function toSeries for compatibility
Allow setitem in a frame where only mixed numerics are present (e.g. int and float), (GH3037)
HDFStore
- Provide dotted attribute access toget from stores (e.g. store.df == store[‘df’])
- New keywordsiterator=boolean, andchunksize=number_in_a_chunk are provided to support iteration onselect andselect_as_multiple (GH3076)
- supportread_hdf/to_hdf API similar toread_csv/to_csv (GH3222)

Addsqueeze method to possibly remove length 1 dimensions from an object.

In [1]:p=pd.Panel(np.random.randn(3,4,4),items=['ItemA','ItemB','ItemC'],   ...:major_axis=pd.date_range('20010102',periods=4),   ...:minor_axis=['A','B','C','D'])   ...:In [2]:pOut[2]:<class 'pandas.core.panel.Panel'>Dimensions: 3 (items) x 4 (major_axis) x 4 (minor_axis)Items axis: ItemA to ItemCMajor_axis axis: 2001-01-02 00:00:00 to 2001-01-05 00:00:00Minor_axis axis: A to DIn [3]:p.reindex(items=['ItemA']).squeeze()Out[3]:                   A         B         C         D2001-01-02  0.469112 -0.282863 -1.509059 -1.1356322001-01-03  1.212112 -0.173215  0.119209 -1.0442362001-01-04 -0.861849 -2.104569 -0.494929  1.0718042001-01-05  0.721555 -0.706771 -1.039575  0.271860In [4]:p.reindex(items=['ItemA'],minor=['B']).squeeze()Out[4]:2001-01-02   -0.2828632001-01-03   -0.1732152001-01-04   -2.1045692001-01-05   -0.706771Freq: D, Name: B, dtype: float64

Improvement to Yahoo API access inpd.io.data.Options (GH2758)
added optiondisplay.max_seq_items to control the number of elements printed per sequence pprinting it. (GH2979)
added optiondisplay.chop_threshold to control display of small numerical values. (GH2739)
added optiondisplay.max_info_rows to prevent verbose_info from beingcalculated for frames above 1M rows (configurable). (GH2807,GH2918)
value_counts() now accepts a “normalize” argument, for normalized histograms. (GH2710).
DataFrame.from_records now accepts not only dicts but any instance of the collections.Mapping ABC.

Allow selection semantics via a string with a datelike index to work in both Series and DataFrames (GH3070)

In [5]:idx=pd.date_range("2001-10-1",periods=5,freq='M')In [6]:ts=pd.Series(np.random.rand(len(idx)),index=idx)In [7]:ts['2001']Out[7]:2001-10-31    0.8387962001-11-30    0.8973332001-12-31    0.732592Freq: M, dtype: float64In [8]:df=pd.DataFrame(dict(A=ts))In [9]:df['2001']Out[9]:                   A2001-10-31  0.8387962001-11-30  0.8973332001-12-31  0.732592

added optiondisplay.mpl_style providing a sleeker visual style for plots. Based onhttps://gist.github.com/huyng/816622 (GH3075).
Improved performance across several core functions by taking memory ordering ofarrays into account. Courtesy of @stephenwlin (GH3130)
Improved performance of groupby transform method (GH2121)
Handle “ragged” CSV files missing trailing delimiters in rows with missing fieldswhen also providing explicit list of column names (so the parser knows how many columns to expect in the result) (GH2981)
On a mixed DataFrame, allow setting with indexers with ndarray/DataFrame on rhs (GH3216)
Treat boolean values as integers (values 1 and 0) for numeric operations. (GH2641)
Addtime method to DatetimeIndex (GH3180)
Return NA when using Series.str[...] for values that are not long enough (GH3223)
Display cursor coordinate information in time-series plots (GH1670)
to_html() now accepts an optional “escape” argument to control reserved HTML characterescaping (enabled by default) and escapes&, in addition to< and>. (GH2919)

API Changes¶

Do not automatically upcast numeric specified dtypes toint64 orfloat64 (GH622 andGH797)
DataFrame construction of lists and scalars, with no dtype present, willresult in casting toint64 orfloat64, regardless of platform.This is not an apparent change in the API, but noting it.
Guarantee thatconvert_objects() for Series/DataFrame always returns acopy
groupby operations will respect dtypes for numeric float operations(float32/float64); other types will be operated on, and will try to castback to the input dtype (e.g. if an int is passed, as long as the outputdoesn’t have nans, then an int will be returned)
backfill/pad/take/diff/ohlc will now supportfloat32/int16/int8operations
Block types will upcast as needed in where/masking operations (GH2793)
Series now automatically will try to set the correct dtype based on passeddatetimelike objects (datetime/Timestamp)
- timedelta64 are returned in appropriate cases (e.g. Series - Series,when both are datetime64)
- mixed datetimes and objects (GH2751) in a constructor will be castcorrectly
- astype on datetimes to object are now handled (as well as NaTconversions to np.nan)
- all timedelta like objects will be correctly assigned totimedelta64with mixedNaN and/orNaT allowed
arguments to DataFrame.clip were inconsistent to numpy and Series clipping(GH2747)
util.testing.assert_frame_equal now checks the column and index names (GH2964)
Constructors will now return a more informative ValueError on failureswhen invalid shapes are passed
Don’t suppress TypeError in GroupBy.agg (GH3238)
Methods return None when inplace=True (GH1893)
HDFStore
- added the methodselect_column to select a single column from a table as a Series.
- deprecated theunique method, can be replicated byselect_column(key,column).unique()
- min_itemsize parameter will now automatically create data_columns for passed keys
Downcast on pivot if possible (GH3283), adds argumentdowncast tofillna
Introduced optionsdisplay.height/width for explicitly specifying terminalheight/width in characters. Deprecated display.line_width, now replaced by display.width.These defaults are in effect for scripts as well, so unless disabled, previouslyvery wide output will now be output as “expand_repr” style wrapped output.
Various defaults for options (including display.max_rows) have been revised,after a brief survey concluded they were wrong for everyone. Now at w=80,h=60.
HTML repr output in IPython qtconsole is once again controlled by the optiondisplay.notebook_repr_html, and on by default.

Bug Fixes¶

Fix seg fault on empty data frame when fillna withpad orbackfill(GH2778)
Single element ndarrays of datetimelike objects are handled(e.g. np.array(datetime(2001,1,1,0,0))), w/o dtype being passed
0-dim ndarrays with a passed dtype are handled correctly(e.g. np.array(0.,dtype=’float32’))
Fix some boolean indexing inconsistencies in Series.__getitem__/__setitem__(GH2776)
Fix issues with DataFrame and Series constructor with integers thatoverflowint64 and some mixed typed type lists (GH2845)
HDFStore
- Fix weird PyTables error when using too many selectors in a wherealso correctly filter on any number of values in a Term expression(so not using numexpr filtering, but isin filtering)
- Internally, change all variables to be private-like (now have leadingunderscore)
- Fixes for query parsing to correctly interpret boolean and != (GH2849,GH2973)
- Fixes for pathological case on SparseSeries with 0-len array andcompression (GH2931)
- Fixes bug with writing rows if part of a block was all-nan (GH3012)
- Exceptions are now ValueError or TypeError as needed
- A table will now raise if min_itemsize contains fields which are not queryables
Bug showing up in applymap where some object type columns are converted (GH2909)had an incorrect default in convert_objects
TimeDeltas
- Series ops with a Timestamp on the rhs was throwing an exception (GH2898)added tests for Series ops with datetimes,timedeltas,Timestamps, and datelikeSeries on both lhs and rhs
- Fixed subtle timedelta64 inference issue on py3 & numpy 1.7.0 (GH3094)
- Fixed some formatting issues on timedelta when negative
- Support null checking on timedelta64, representing (and formatting) with NaT
- Support setitem with np.nan value, converts to NaT
- Support min/max ops in a Dataframe (abs not working, nor do we error on non-supported ops)
- Support idxmin/idxmax/abs/max/min in a Series (GH2989,GH2982)
Bug on in-place putmasking on aninteger series that needs to be converted tofloat (GH2746)
Bug in argsort ofdatetime64[ns] Series withNaT (GH2967)
Bug in value_counts ofdatetime64[ns] Series (GH3002)
Fixed printing ofNaT in an index
Bug in idxmin/idxmax ofdatetime64[ns] Series withNaT (GH2982)
Bug inicol,take with negative indicies was producing incorrect returnvalues (seeGH2922,GH2892), also check for out-of-bounds indices (GH3029)
Bug in DataFrame column insertion when the column creation fails, existing frame is left inan irrecoverable state (GH3010)
Bug in DataFrame update, combine_first where non-specified values could causedtype changes (GH3016,GH3041)
Bug in groupby with first/last where dtypes could change (GH3041,GH2763)
Formatting of an index that hasnan was inconsistent or wrong (would fill fromother values), (GH2850)
Unstack of a frame with no nans would always cause dtype upcasting (GH2929)
Fix scalar datetime.datetime parsing bug in read_csv (GH3071)
Fixed slow printing of large Dataframes, due to inefficient dtypereporting (GH2807)
Fixed a segfault when using a function as grouper in groupby (GH3035)
Fix pretty-printing of infinite data structures (closesGH2978)
Fixed exception when plotting timeseries bearing a timezone (closesGH2877)
str.contains ignored na argument (GH2806)
Substitute warning for segfault when grouping with categorical grouperof mismatched length (GH3011)
Fix exception in SparseSeries.density (GH2083)
Fix upsampling bug with closed=’left’ and daily to daily data (GH3020)
Fixed missing tick bars on scatter_matrix plot (GH3063)
Fixed bug in Timestamp(d,tz=foo) when d is date() rather then datetime() (GH2993)
series.plot(kind=’bar’) now respects pylab color schem (GH3115)
Fixed bug in reshape if not passed correct input, now raises TypeError (GH2719)
Fixed a bug where Series ctor did not respect ordering if OrderedDict passed in (GH3282)
Fix NameError issue on RESO_US (GH2787)
Allow selection in anunordered timeseries to work similaryto anordered timeseries (GH2437).
Fix implemented.xs when called withaxes=1 and a level parameter (GH2903)
Timestamp now supports the class method fromordinal similar to datetimes (GH3042)
Fix issue with indexing a series with a boolean key and specifiying a 1-len list on the rhs (GH2745)or a list on the rhs (GH3235)
Fixed bug in groupby apply when kernel generate list of arrays having unequal len (GH1738)
fixed handling of rolling_corr with center=True which could produce corr>1 (GH3155)
Fixed issues where indices can be passed as ‘index/column’ in addition to 0/1 for the axis parameter
PeriodIndex.tolist now boxes to Period (GH3178)
PeriodIndex.get_loc KeyError now reports Period instead of ordinal (GH3179)
df.to_records bug when handling MultiIndex (GH3189)
Fix Series.__getitem__ segfault when index less than -length (GH3168)
Fix bug when using Timestamp as a date parser (GH2932)
Fix bug creating date range from Timestamp with time zone and passing sametime zone (GH2926)
Add comparison operators to Period object (GH2781)
Fix bug when concatenating two Series into a DataFrame when they have thesame name (GH2797)
Fix automatic color cycling when plotting consecutive timeserieswithout color arguments (GH2816)
fixed bug in the pickling of PeriodIndex (GH2891)
Upcast/split blocks when needed in a mixed DataFrame when setitemwith an indexer (GH3216)
Invoking df.applymap on a dataframe with dupe cols now raises a ValueError (GH2786)
Apply with invalid returned indices raise correct Exception (GH2808)
Fixed a bug in plotting log-scale bar plots (GH3247)
df.plot() grid on/off now obeys the mpl default style, just likeseries.plot(). (GH3233)
Fixed a bug in the legend of plotting.andrews_curves() (GH3278)
Produce a series on apply if we only generate a singular series and havea simple index (GH2893)
Fix Python ASCII file parsing when integer falls outside of floating pointspacing (GH3258)
fixed pretty priniting of sets (GH3294)
Panel() and Panel.from_dict() now respects ordering when give OrderedDict (GH3303)
DataFrame where with a datetimelike incorrectly selecting (GH3311)
Ensure index casts work even in Int64Index
Fix set_index segfault when passing MultiIndex (GH3308)
Ensure pickles created in py2 can be read in py3
Insert ellipsis in MultiIndex summary repr (GH3348)
Groupby will handle mutation among an input groups columns (and fallbackto non-fast apply) (GH3380)
Eliminated unicode errors on FreeBSD when using MPL GTK backend (GH3360)
Period.strftime should return unicode strings always (GH3363)
Respect passed read_* chunksize in get_chunk function (GH3406)

pandas 0.10.1¶

Release date: 2013-01-22

New Features¶

Add data interface to World Bank WDI pandas.io.wb (GH2592)

API Changes¶

Restored inplace=True behavior returning self (same object) withdeprecation warning until 0.11 (GH1893)
HDFStore
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
- removed keywordcompression fromput (replaced by keywordcomplib to be consistent across library)
- warnPerformanceWarning if you are attempting to store types that will be pickled by PyTables

Improvements to existing features¶

HDFStore
- enables storing of multi-index dataframes (closesGH1277)
- support data column indexing and selection, viadata_columns keywordin append
- support write chunking to reduce memory footprint, viachunksizekeyword to append
- support automagic indexing viaindex keyword to append
- supportexpectedrows keyword in append to informPyTables aboutthe expected tablesize
- supportstart andstop keywords in select to limit the rowselection space
- addedget_store context manager to automatically import with pandas
- added column filtering viacolumns keyword in select
- added methods append_to_multiple/select_as_multiple/select_as_coordinatesto do multiple-table append/selection
- added support for datetime64 in columns
- added methodunique to select the unique values in an indexable ordata column
- added methodcopy to copy an existing store (and possibly upgrade)
- show the shape of the data on disk for non-table stores when printing thestore
- added ability to read PyTables flavor tables (allows compatibility toother HDF5 systems)
Addlogx option to DataFrame/Series.plot (GH2327,GH2565)
Support reading gzipped data from file-like object
pivot_table aggfunc can be anything used in GroupBy.aggregate (GH2643)
Implement DataFrame merges in case where set cardinalities might overflow64-bit integer (GH2690)
Raise exception in C file parser if integer dtype specified and have NAvalues. (GH2631)
Attempt to parse ISO8601 format dates when parse_dates=True in read_csv formajor performance boost in such cases (GH2698)
Add methodsneg andinv to Series
Implementkind option inExcelFile to indicate whether it’s an XLSor XLSX file (GH2613)
Documented a fast-path in pd.read_csv when parsing iso8601 datetime stringsyielding as much as a 20x speedup. (GH5993)

Bug Fixes¶

Fix read_csv/read_table multithreading issues (GH2608)
HDFStore
- correctly handlenan elements in string columns; serialize via thenan_rep keyword to append
- raise correctly on non-implemented column types (unicode/date)
- handle correctlyTerm passed types (e.g.index<1000, when indexisInt64), (closesGH512)
- handle Timestamp correctly in data_columns (closesGH2637)
- contains correctly matches on non-natural names
- correctly storefloat32 dtypes in tables (if not other float types inthe same table)
Fix DataFrame.info bug with UTF8-encoded columns. (GH2576)
Fix DatetimeIndex handling of FixedOffset tz (GH2604)
More robust detection of being in IPython session for wide DataFrameconsole formatting (GH2585)
Fix platform issues withfile:/// in unit test (GH2564)
Fix bug and possible segfault when grouping by hierarchical level thatcontains NA values (GH2616)
Ensure that MultiIndex tuples can be constructed with NAs (GH2616)
Fix int64 overflow issue when unstacking MultiIndex with many levels(GH2616)
Exclude non-numeric data from DataFrame.quantile by default (GH2625)
Fix a Cython C int64 boxing issue causing read_csv to return incorrectresults (GH2599)
Fix groupby summing performance issue on boolean data (GH2692)
Don’t bork Series containing datetime64 values with to_datetime (GH2699)
Fix DataFrame.from_records corner case when passed columns, index column,but empty record list (GH2633)
Fix C parser-tokenizer bug with trailing fields. (GH2668)
Don’t exclude non-numeric data from GroupBy.max/min (GH2700)
Don’t lose time zone when calling DatetimeIndex.drop (GH2621)
Fix setitem on a Series with a boolean key and a non-scalar as value(GH2686)
Box datetime64 values in Series.apply/map (GH2627,GH2689)
Upconvert datetime + datetime64 values when concatenating frames (GH2624)
Raise a more helpful error message in merge operations when one DataFramehas duplicate columns (GH2649)
Fix partial date parsing issue occuring only when code is run at EOM(GH2618)
Prevent MemoryError when using counting sort in sortlevel withhigh-cardinality MultiIndex objects (GH2684)
Fix Period resampling bug when all values fall into a single bin (GH2070)
Fix buggy interaction with usecols argument in read_csv when there is animplicit first index column (GH2654)
Fix bug inIndex.summary() where string format methods were being called incorrectly.(GH3869)

pandas 0.10.0¶

Release date: 2012-12-17

New Features¶

Brand new high-performance delimited file parsing engine written in C andCython. 50% or better performance in many standard use cases with afraction as much memory usage. (GH407,GH821)
Many new file parser (read_csv, read_table) features:
- Support for on-the-fly gzip or bz2 decompression (compression option)
- Ability to get back numpy.recarray instead of DataFrame(as_recarray=True)
- dtype option: explicit column dtypes
- usecols option: specify list of columns to be read from a file. Goodfor reading very wide files with many irrelevant columns (GH1216 GH926,GH2465)
- Enhanced unicode decoding support viaencoding option
- skipinitialspace dialect option
- Can specify strings to be recognized as True (true_values) or False(false_values)
- High-performancedelim_whitespace option for whitespace-delimitedfiles; a preferred alternative to the ‘s+’ regular expression delimiter
- Option to skip “bad” lines (wrong number of fields) that would otherwisehave caused an error in the past (error_bad_lines andwarn_bad_linesoptions)
- Substantially improved performance in the parsing of integers withthousands markers and lines with comments
- Easy of European (and other) decimal formats (decimal option) (GH584,GH2466)
- Custom line terminators (e.g. lineterminator=’~’) (GH2457)
- Handling of no trailing commas in CSV files (GH2333)
- Ability to handle fractional seconds in date_converters (GH2209)
- read_csv allow scalar arg to na_values (GH1944)
- Explicit column dtype specification in read_* functions (GH1858)
- Easier CSV dialect specification (GH1743)
- Improve parser performance when handling special characters (GH1204)
Google Analytics API integration with easy oauth2 workflow (GH2283)
Add error handling to Series.str.encode/decode (GH2276)
Addwhere andmask to Series (GH2337)
Grouped histogram viaby keyword in Series/DataFrame.hist (GH2186)
Support optionalmin_periods keyword incorr andcovfor both Series and DataFrame (GH2002)
Addduplicated anddrop_duplicates functions to Series (GH1923)
Add docs forHDFStoretable format
‘density’ property inSparseSeries (GH2384)
Addffill andbfill convenience functions for forward- andbackfilling time series data (GH2284)
New option configuration system and functionsset_option,get_option,describe_option, andreset_option. Deprecateset_printoptions andreset_printoptions (GH2393).You can also access options as attributes viapandas.options.X
Wide DataFrames can be viewed more easily in the console with newexpand_frame_repr andline_width configuration options. This is on bydefault now (GH2436)
Scikits.timeseries-like moving window functions viarolling_window (GH1270)

Experimental Features¶

Add support for Panel4D, a named 4 Dimensional structure
Add support for ndpanel factory functions, to create custom,domain-specific N-Dimensional containers

API Changes¶

The default binning/labeling behavior forresample has been changed toclosed=’left’, label=’left’ for daily and lower frequencies. This hadbeen a large source of confusion for users. See “what’s new” page for moreon this. (GH2410)
Methods withinplace option now return None instead of the calling(modified) object (GH1893)
The special case DataFrame - TimeSeries doing column-by-column broadcastinghas been deprecated. Users should explicitly do e.g. df.sub(ts, axis=0)instead. This is a legacy hack and can lead to subtle bugs.
inf/-inf are no longer considered as NA by isnull/notnull. To be clear, thisis legacy cruft from early pandas. This behavior can be globally re-enabledusing the new optionmode.use_inf_as_null (GH2050,GH1919)
pandas.merge will now default tosort=False. For many use casessorting the join keys is not necessary, and doing it by default is wasteful
Specifyheader=0 explicitly to replace existing column names in file inread_* functions.
Default column names for header-less parsed files (yielded by read_csv,etc.) are now the integers 0, 1, .... A new argumentprefix has beenadded; to get the v0.9.x behavior specifyprefix='X' (GH2034). This APIchange was made to make the default column names more consistent with theDataFrame constructor’s default column names when none are specified.
DataFrame selection using a boolean frame now preserves input shape
If function passed to Series.apply yields a Series, result will be aDataFrame (GH2316)
Values like YES/NO/yes/no will not be considered as boolean by default anylonger in the file parsers. This can be customized using the newtrue_values andfalse_values options (GH2360)
obj.fillna() is no longer valid; makemethod=’pad’ no longer thedefault option, to be more explicit about what kind of filling toperform. Addffill/bfill convenience functions per above (GH2284)
HDFStore.keys() now returns an absolute path-name for each key
to_string() now always returns a unicode string. (GH2224)
File parsers will not handle NA sentinel values arising from passedconverter functions

Improvements to existing features¶

Addnrows option to DataFrame.from_records for iterators (GH1794)
Unstack/reshape algorithm rewrite to avoid high memory use in cases wherethe number of observed key-tuples is much smaller than the total possiblenumber that could occur (GH2278). Also improves performance in most cases.
Support duplicate columns in DataFrame.from_records (GH2179)
Addnormalize option to Series/DataFrame.asfreq (GH2137)
SparseSeries and SparseDataFrame construction from empty and scalarvalues now no longer create dense ndarrays unnecessarily (GH2322)
HDFStore now supports hierarchical keys (GH2397)
Support multiple query selection formats forHDFStoretables (GH1996)
Supportdelstore['df'] syntax to delete HDFStores
Add multi-dtype support forHDFStoretables
min_itemsize parameter can be specified inHDFStoretable creation
Indexing support inHDFStoretables (GH698)
Addline_terminator option to DataFrame.to_csv (GH2383)
added implementation of str(x)/unicode(x)/bytes(x) to major pandas datastructures, which should do the right thing on both py2.x and py3.x. (GH2224)
Reduce groupby.apply overhead substantially by low-level manipulation ofinternal NumPy arrays in DataFrames (GH535)
Implementvalue_vars inmelt and addmelt to pandas namespace(GH2412)
Added boolean comparison operators to Panel
EnableSeries.str.strip/lstrip/rstrip methods to take an argument (GH2411)
The DataFrame ctor now respects column ordering when givenan OrderedDict (GH2455)
Assigning DatetimeIndex to Series changes the class to TimeSeries (GH2139)
Improve performance of .value_counts method on non-integer data (GH2480)
get_level_values method for MultiIndex return Index instead of ndarray (GH2449)
convert_to_r_dataframe conversion for datetime values (GH2351)
AllowDataFrame.to_csv to represent inf and nan differently (GH2026)
Addmin_i argument tonancorr to specify minimum required observations (GH2002)
Addinplace option tosortlevel /sort functions on DataFrame (GH1873)
Enable DataFrame to accept scalar constructor values like Series (GH1856)
DataFrame.from_records now takes optionalsize parameter (GH1794)
include iris dataset (GH1709)
No datetime64 DataFrame column conversion of datetime.datetime with tzinfo (GH1581)
Micro-optimizations in DataFrame for tracking state of internal consolidation (GH217)
Format parameter in DataFrame.to_csv (GH1525)
Partial string slicing forDatetimeIndex for daily and higher frequencies (GH2306)
Implementcol_space parameter into_html andto_string in DataFrame (GH1000)
OverrideSeries.tolist and box datetime64 types (GH2447)
Optimizeunstack memory usage by compressing indices (GH2278)
Fix HTML repr in IPython qtconsole if opening window is small (GH2275)
Escape more special characters in console output (GH2492)
df.select now invokes bool on the result of crit(x) (GH2487)

Bug Fixes¶

Fix major performance regression in DataFrame.iteritems (GH2273)
Fixes bug when negative period passed to Series/DataFrame.diff (GH2266)
Escape tabs in console output to avoid alignment issues (GH2038)
Properly box datetime64 values when retrieving cross-section frommixed-dtype DataFrame (GH2272)
Fix concatenation bug leading toGH2057,GH2257
Fix regression in Index console formatting (GH2319)
Box Period data when assigning PeriodIndex to frame column (GH2243,GH2281)
Raise exception on calling reset_index on Series with inplace=True (GH2277)
Enable setting multiple columns in DataFrame with hierarchical columns(GH2295)
Respect dtype=object in DataFrame constructor (GH2291)
Fix DatetimeIndex.join bug with tz-aware indexes and how=’outer’ (GH2317)
pop(...) and del works with DataFrame with duplicate columns (GH2349)
Treat empty strings as NA in date parsing (rather than let dateutil dosomething weird) (GH2263)
Prevent uint64 -> int64 overflows (GH2355)
Enable joins between MultiIndex and regular Index (GH2024)
Fix time zone metadata issue when unioning non-overlapping DatetimeIndexobjects (GH2367)
Raise/handle int64 overflows in parsers (GH2247)
Deleting of consecutive rows inHDFStoretables` is much faster than before
Appending on a HDFStore would fail if the table was not first created viaput
Usecol_space argument as minimum column width in DataFrame.to_html (GH2328)
Fix tz-aware DatetimeIndex.to_period (GH2232)
Fix DataFrame row indexing case with MultiIndex (GH2314)
Fix to_excel exporting issues with Timestamp objects in index (GH2294)
Fixes assigning scalars and array to hierarchical column chunk (GH1803)
Fixed a UnicodeDecodeError with series tidy_repr (GH2225)
Fixed issued with duplicate keys in an index (GH2347,GH2380)
Fixed issues re: Hash randomization, default on starting w/ py3.3 (GH2331)
Fixed issue with missing attributes after loading a pickled dataframe (GH2431)
Fix Timestamp formatting with tzoffset time zone in dateutil 2.1 (GH2443)
Fix GroupBy.apply issue when using BinGrouper to do ts binning (GH2300)
Fix issues resulting from datetime.datetime columns being converted todatetime64 when calling DataFrame.apply. (GH2374)
Raise exception when calling to_panel on non uniquely-indexed frame (GH2441)
Improved detection of console encoding on IPython zmq frontends (GH2458)
Preserve time zone when .append-ing two time series (GH2260)
Box timestamps when calling reset_index on time-zone-aware index ratherthan creating a tz-less datetime64 column (GH2262)
Enable searching non-string columns in DataFrame.filter(like=...) (GH2467)
Fixed issue with losing nanosecond precision upon conversion to DatetimeIndex(GH2252)
Handle timezones in Datetime.normalize (GH2338)
Fix test case where dtype specification with endianness causesfailures on big endian machines (GH2318)
Fix plotting bug where upsampling causes data to appear shifted in time (GH2448)
Fixread_csv failure for UTF-16 with BOM and skiprows(GH2298)
read_csv with names arg not implicitly setting header=None(GH2459)
Unrecognized compression mode causes segfault in read_csv(GH2474)
In read_csv, header=0 and passed names should discard first row(GH2269)
Correctly route to stdout/stderr in read_table (GH2071)
Fix exception when Timestamp.to_datetime is called on a Timestamp with tzoffset (GH2471)
Fixed unintentional conversion of datetime64 to long in groupby.first() (GH2133)
Union of empty DataFrames now return empty with concatenated index (GH2307)
DataFrame.sort_index raises more helpful exception if sorting by columnwith duplicates (GH2488)
DataFrame.to_string formatters can be list, too (GH2520)
DataFrame.combine_first will always result in the union of the index andcolumns, even if one DataFrame is length-zero (GH2525)
Fix several DataFrame.icol/irow with duplicate indices issues (GH2228,GH2259)
Use Series names for column names when using concat with axis=1 (GH2489)
Raise Exception if start, end, periods all passed to date_range (GH2538)
Fix Panel resampling issue (GH2537)

pandas 0.9.1¶

Release date: 2012-11-14

New Features¶

Can specify multiple sort orders in DataFrame/Series.sort/sort_index (GH928)
Newtop andbottom options for handling NAs in rank (GH1508,GH2159)
Addwhere andmask functions to DataFrame (GH2109,GH2151)
Addat_time andbetween_time functions to DataFrame (GH2149)
Add flexiblepow andrpow methods to DataFrame (GH2190)

API Changes¶

Upsampling period index “spans” intervals. Example: annual periodsupsampled to monthly will span all months in each year
Period.end_time will yield timestamp at last nanosecond in the interval(GH2124,GH2125,GH1764)
File parsers no longer coerce to float or bool for columns that have customconverters specified (GH2184)

Improvements to existing features¶

Time rule inference for week-of-month (e.g. WOM-2FRI) rules (GH2140)
Improve performance of datetime + business day offset with large number ofoffset periods
Improve HTML display of DataFrame objects with hierarchical columns
Enable referencing of Excel columns by their column names (GH1936)
DataFrame.dot can accept ndarrays (GH2042)
Support negative periods in Panel.shift (GH2164)
Make .drop(...) work with non-unique indexes (GH2101)
Improve performance of Series/DataFrame.diff (re:GH2087)
Support unary ~ (__invert__) in DataFrame (GH2110)
Turn off pandas-style tick locators and formatters (GH2205)
DataFrame[DataFrame] uses DataFrame.where to compute masked frame (GH2230)

Bug Fixes¶

Fix some duplicate-column DataFrame constructor issues (GH2079)
Fix bar plot color cycle issues (GH2082)
Fix off-center grid for stacked bar plots (GH2157)
Fix plotting bug if inferred frequency is offset with N > 1 (GH2126)
Implement comparisons on date offsets with fixed delta (GH2078)
Handle inf/-inf correctly in read_* parser functions (GH2041)
Fix matplotlib unicode interaction bug
Make WLS r-squared match statsmodels 0.5.0 fixed value
Fix zero-trimming DataFrame formatting bug
Correctly compute/box datetime64 min/max values from Series.min/max (GH2083)
Fix unstacking edge case with unrepresented groups (GH2100)
Fix Series.str failures when using pipe pattern ‘|’ (GH2119)
Fix pretty-printing of dict entries in Series, DataFrame (GH2144)
Cast other datetime64 values to nanoseconds in DataFrame ctor (GH2095)
Alias Timestamp.astimezone to tz_convert, so will yield Timestamp (GH2060)
Fix timedelta64 formatting from Series (GH2165,GH2146)
Handle None values gracefully in dict passed to Panel constructor (GH2075)
Box datetime64 values as Timestamp objects in Series/DataFrame.iget (GH2148)
Fix Timestamp indexing bug in DatetimeIndex.insert (GH2155)
Use index name(s) (if any) in DataFrame.to_records (GH2161)
Don’t lose index names in Panel.to_frame/DataFrame.to_panel (GH2163)
Work around length-0 boolean indexing NumPy bug (GH2096)
Fix partial integer indexing bug in DataFrame.xs (GH2107)
Fix variety of cut/qcut string-bin formatting bugs (GH1978,GH1979)
Raise Exception when xs view not possible of MultiIndex’d DataFrame (GH2117)
Fix groupby(...).first() issue with datetime64 (GH2133)
Better floating point error robustness in some rolling_* functions(GH2114,GH2527)
Fix ewma NA handling in the middle of Series (GH2128)
Fix numerical precision issues in diff with integer data (GH2087)
Fix bug in MultiIndex.__getitem__ with NA values (GH2008)
Fix DataFrame.from_records dict-arg bug when passing columns (GH2179)
Fix Series and DataFrame.diff for integer dtypes (GH2087,GH2174)
Fix bug when taking intersection of DatetimeIndex with empty index (GH2129)
Pass through timezone information when calling DataFrame.align (GH2127)
Properly sort when joining on datetime64 values (GH2196)
Fix indexing bug in which False/True were being coerced to 0/1 (GH2199)
Many unicode formatting fixes (GH2201)
Fix improper MultiIndex conversion issue when assigninge.g. DataFrame.index (GH2200)
Fix conversion of mixed-type DataFrame to ndarray with dup columns (GH2236)
Fix duplicate columns issue (GH2218,GH2219)
Fix SparseSeries.__pow__ issue with NA input (GH2220)
Fix icol with integer sequence failure (GH2228)
Fixed resampling tz-aware time series issue (GH2245)
SparseDataFrame.icol was not returning SparseSeries (GH2227,GH2229)
Enable ExcelWriter to handle PeriodIndex (GH2240)
Fix issue constructing DataFrame from empty Series with name (GH2234)
Use console-width detection in interactive sessions only (GH1610)
Fix parallel_coordinates legend bug with mpl 1.2.0 (GH2237)
Make tz_localize work in corner case of empty Series (GH2248)

pandas 0.9.0¶

Release date: 10/7/2012

New Features¶

Addstr.encode andstr.decode to Series (GH1706)
Addto_latex method to DataFrame (GH1735)
Add convenient expanding window equivalents of all rolling_* ops (GH1785)
Add Options class to pandas.io.data for fetching options data from Yahoo!Finance (GH1748,GH1739)
Recognize and convert more boolean values in file parsing (Yes, No, TRUE,FALSE, variants thereof) (GH1691,GH1295)
Add Panel.update method, analogous to DataFrame.update (GH1999,GH1988)

Improvements to existing features¶

Proper handling of NA values in merge operations (GH1990)
Addflags option forre.compile in some Series.str methods (GH1659)
Parsing of UTC date strings in read_* functions (GH1693)
Handle generator input to Series (GH1679)
Addna_action=’ignore’ to Series.map to quietly propagate NAs (GH1661)
Add args/kwds options to Series.apply (GH1829)
Add inplace option to Series/DataFrame.reset_index (GH1797)
Addlevel parameter toSeries.reset_index
Add quoting option for DataFrame.to_csv (GH1902)
Indicate long column value truncation in DataFrame output with ... (GH1854)
DataFrame.dot will not do data alignment, and also work with Series (GH1915)
Addna option for missing data handling in some vectorized stringmethods (GH1689)
If index_label=False in DataFrame.to_csv, do not print fields/commas in thetext output. Results in easier importing into R (GH1583)
Can pass tuple/list of axes to DataFrame.dropna to simplify repeated calls(dropping both columns and rows) (GH924)
Improve DataFrame.to_html output for hierarchically-indexed rows (do notrepeat levels) (GH1929)
TimeSeries.between_time can now select times across midnight (GH1871)
Enableskip_footer parameter inExcelFile.parse (GH1843)

API Changes¶

Change default header names in read_* functions to more Pythonic X0, X1,etc. instead of X.1, X.2. (GH2000)
Deprecatedday_of_year API removed from PeriodIndex, usedayofyear(GH1723)
Don’t modify NumPy suppress printoption at import time
The internal HDF5 data arrangement for DataFrames has beentransposed. Legacy files will still be readable by HDFStore (GH1834,GH1824)
Legacy cruft removed: pandas.stats.misc.quantileTS
Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776)
Empty DataFrame columns are now created as object dtype. This will preventa class of TypeErrors that was occurring in code where the dtype of acolumn would depend on the presence of data or not (e.g. a SQL query havingresults) (GH1783)
Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame(GH1630)
first andlast methods inGroupBy no longer drop non-numeric columns(GH1809)
Resolved inconsistencies in specifying custom NA values in text parser.na_values of type dict no longer override default NAs unlesskeep_default_na is set to false explicitly (GH1657)
Enableskipfooter parameter in text parsers as an alias forskip_footer

Bug Fixes¶

Perform arithmetic column-by-column in mixed-type DataFrame to avoid typeupcasting issues. Caused downstream DataFrame.diff bug (GH1896)
Fix matplotlib auto-color assignment when no custom spectrum passed. Alsorespect passed color keyword argument (GH1711)
Fix resampling logical error with closed=’left’ (GH1726)
Fix critical DatetimeIndex.union bugs (GH1730,GH1719,GH1745,GH1702,GH1753)
Fix critical DatetimeIndex.intersection bug with unanchored offsets (GH1708)
Fix MM-YYYY time series indexing case (GH1672)
Fix case where Categorical group key was not being passed into index inGroupBy result (GH1701)
Handle Ellipsis in Series.__getitem__/__setitem__ (GH1721)
Fix some bugs with handling datetime64 scalars of other units in NumPy 1.6and 1.7 (GH1717)
Fix performance issue in MultiIndex.format (GH1746)
Fixed GroupBy bugs interacting with DatetimeIndex asof / map methods (GH1677)
Handle factors with NAs in pandas.rpy (GH1615)
Fix statsmodels import in pandas.stats.var (GH1734)
Fix DataFrame repr/info summary with non-unique columns (GH1700)
Fix Series.iget_value for non-unique indexes (GH1694)
Don’t lose tzinfo when passing DatetimeIndex as DataFrame column (GH1682)
Fix tz conversion with time zones that haven’t had any DST transitions sincefirst date in the array (GH1673)
Fix field access with UTC->local conversion on unsorted arrays (GH1756)
Fix isnull handling of array-like (list) inputs (GH1755)
Fix regression in handling of Series in Series constructor (GH1671)
Fix comparison of Int64Index with DatetimeIndex (GH1681)
Fix min_periods handling in new rolling_max/min at array start (GH1695)
Fix errors with how=’median’ and generic NumPy resampling in some casescaused by SeriesBinGrouper (GH1648,GH1688)
When grouping by level, exclude unobserved levels (GH1697)
Don’t lose tzinfo in DatetimeIndex when shifting by different offset (GH1683)
Hack to support storing data with a zero-length axis in HDFStore (GH1707)
Fix DatetimeIndex tz-aware range generation issue (GH1674)
Fix method=’time’ interpolation with intraday data (GH1698)
Don’t plot all-NA DataFrame columns as zeros (GH1696)
Fix bug in scatter_plot with by option (GH1716)
Fix performance problem in infer_freq with lots of non-unique stamps (GH1686)
Fix handling of PeriodIndex as argument to create MultiIndex (GH1705)
Fix re: unicode MultiIndex level names in Series/DataFrame repr (GH1736)
Handle PeriodIndex in to_datetime instance method (GH1703)
Support StaticTzInfo in DatetimeIndex infrastructure (GH1692)
Allow MultiIndex setops with length-0 other type indexes (GH1727)
Fix handling of DatetimeIndex in DataFrame.to_records (GH1720)
Fix handling of general objects in isnull on which bool(...) fails (GH1749)
Fix .ix indexing with MultiIndex ambiguity (GH1678)
Fix .ix setting logic error with non-unique MultiIndex (GH1750)
Basic indexing now works on MultiIndex with > 1000000 elements, regressionfrom earlier version of pandas (GH1757)
Handle non-float64 dtypes in fast DataFrame.corr/cov code paths (GH1761)
Fix DatetimeIndex.isin to function properly (GH1763)
Fix conversion of array of tz-aware datetime.datetime to DatetimeIndex withright time zone (GH1777)
Fix DST issues with generating ancxhored date ranges (GH1778)
Fix issue calling sort on result of Series.unique (GH1807)
Fix numerical issue leading to square root of negative number inrolling_std (GH1840)
Let Series.str.split accept no arguments (like str.split) (GH1859)
Allow user to have dateutil 2.1 installed on a Python 2 system (GH1851)
Catch ImportError less aggressively in pandas/__init__.py (GH1845)
Fix pip source installation bug when installing from GitHub (GH1805)
Fix error when window size > array size in rolling_apply (GH1850)
Fix pip source installation issues via SSH from GitHub
Fix OLS.summary when column is a tuple (GH1837)
Fix bug in __doc__ patching when -OO passed to interpreter(GH1792 GH1741 GH1774)
Fix unicode console encoding issue in IPython notebook (GH1782,GH1768)
Fix unicode formatting issue with Series.name (GH1782)
Fix bug in DataFrame.duplicated with datetime64 columns (GH1833)
Fix bug in Panel internals resulting in error when doing fillna aftertruncate not changing size of panel (GH1823)
Prevent segfault due to MultiIndex not being supported in HDFStore tableformat (GH1848)
Fix UnboundLocalError in Panel.__setitem__ and add better error (GH1826)
Fix to_csv issues with list of string entries. Isnull works on list ofstrings now too (GH1791)
Fix Timestamp comparisons with datetime values outside the nanosecond range(1677-2262)
Revert to prior behavior of normalize_date with datetime.date objects(return datetime)
Fix broken interaction between np.nansum and Series.any/all
Fix bug with multiple column date parsers (GH1866)
DatetimeIndex.union(Int64Index) was broken
Make plot x vs y interface consistent with integer indexing (GH1842)
set_index inplace modified data even if unique check fails (GH1831)
Only use Q-OCT/NOV/DEC in quarterly frequency inference (GH1789)
Upcast to dtype=object when unstacking boolean DataFrame (GH1820)
Fix float64/float32 merging bug (GH1849)
Fixes to Period.start_time for non-daily frequencies (GH1857)
Fix failure when converter used on index_col in read_csv (GH1835)
Implement PeriodIndex.append so that pandas.concat works correctly (GH1815)
Avoid Cython out-of-bounds access causing segfault sometimes in pad_2d,backfill_2d
Fix resampling error with intraday times and anchored target time (likeAS-DEC) (GH1772)
Fix .ix indexing bugs with mixed-integer indexes (GH1799)
Respect passed color keyword argument in Series.plot (GH1890)
Fix rolling_min/max when the window is larger than the size of the inputarray. Check other malformed inputs (GH1899,GH1897)
Rolling variance / standard deviation with only a single observation inwindow (GH1884)
Fix unicode sheet name failure in to_excel (GH1828)
Override DatetimeIndex.min/max to return Timestamp objects (GH1895)
Fix column name formatting issue in length-truncated column (GH1906)
Fix broken handling of copying Index metadata to new instances created byview(...) calls inside the NumPy infrastructure
Support datetime.date again in DateOffset.rollback/rollforward
Raise Exception if set passed to Series constructor (GH1913)
Add TypeError when appending HDFStore table w/ wrong index type (GH1881)
Don’t raise exception on empty inputs in EW functions (e.g. ewma) (GH1900)
Make asof work correctly with PeriodIndex (GH1883)
Fix extlinks in doc build
Fill boolean DataFrame with NaN when calling shift (GH1814)
Fix setuptools bug causing pip not to Cythonize .pyx files sometimes
Fix negative integer indexing regression in .ix from 0.7.x (GH1888)
Fix error while retrieving timezone and utc offset from subclasses ofdatetime.tzinfo without .zone and ._utcoffset attributes (GH1922)
Fix DataFrame formatting of small, non-zero FP numbers (GH1911)
Various fixes by upcasting of date -> datetime (GH1395)
Raise better exception when passing multiple functions with the same name,such as lambdas, to GroupBy.aggregate
Fix DataFrame.apply with axis=1 on a non-unique index (GH1878)
Proper handling of Index subclasses in pandas.unique (GH1759)
Set index names in DataFrame.from_records (GH1744)
Fix time series indexing error with duplicates, under and over hash tablesize cutoff (GH1821)
Handle list keys in addition to tuples in DataFrame.xs whenpartial-indexing a hierarchically-indexed DataFrame (GH1796)
Support multiple column selection in DataFrame.__getitem__ with duplicatecolumns (GH1943)
Fix time zone localization bug causing improper fields (e.g. hours) in timezones that have not had a UTC transition in a long time (GH1946)
Fix errors when parsing and working with with fixed offset timezones(GH1922,GH1928)
Fix text parser bug when handling UTC datetime objects generated bydateutil (GH1693)
Fix plotting bug when ‘B’ is the inferred frequency but index actuallycontains weekends (GH1668,GH1669)
Fix plot styling bugs (GH1666,GH1665,GH1658)
Fix plotting bug with index/columns with unicode (GH1685)
Fix DataFrame constructor bug when passed Series with datetime64 dtypein a dict (GH1680)
Fixed regression in generating DatetimeIndex using timezone awaredatetime.datetime (GH1676)
Fix DataFrame bug when printing concatenated DataFrames with duplicatedcolumns (GH1675)
Fixed bug when plotting time series with multiple intraday frequencies(GH1732)
Fix bug in DataFrame.duplicated to enable iterables other than list-typesas input argument (GH1773)
Fix resample bug when passed list of lambdas ashow argument (GH1808)
Repr fix for MultiIndex level with all NAs (GH1971)
Fix PeriodIndex slicing bug when slice start/end are out-of-bounds (GH1977)
Fix read_table bug when parsing unicode (GH1975)
Fix BlockManager.iget bug when dealing with non-unique MultiIndex as columns(GH1970)
Fix reset_index bug if both drop and level are specified (GH1957)
Work around unsafe NumPy object->int casting with Cython function (GH1987)
Fix datetime64 formatting bug in DataFrame.to_csv (GH1993)
Default start date in pandas.io.data to 1/1/2000 as the docs say (GH2011)

pandas 0.8.1¶

Release date: July 22, 2012

New Features¶

Add vectorized, NA-friendly string methods to Series (GH1621,GH620)
Can pass dict of per-column line styles to DataFrame.plot (GH1559)
Selective plotting to secondary y-axis on same subplot (GH1640)
Add newbootstrap_plot plot function
Add newparallel_coordinates plot function (GH1488)
Addradviz plot function (GH1566)
Addmulti_sparse option toset_printoptions to modify display ofhierarchical indexes (GH1538)
Adddropna method to Panel (GH171)

Improvements to existing features¶

Use moving min/max algorithms from Bottleneck in rolling_min/rolling_maxfor > 100x speedup. (GH1504,GH50)
Add Cython group median method for >15x speedup (GH1358)
Drastically improveto_datetime performance on ISO8601 datetime strings(with no time zones) (GH1571)
Improve single-key groupby performance on large data sets, accelerate use ofgroupby with a Categorical variable
Add ability to append hierarchical index levels withset_index and todrop single levels withreset_index (GH1569,GH1577)
Always apply passed functions inresample, even if upsampling (GH1596)
Avoid unnecessary copies in DataFrame constructor with explicit dtype (GH1572)
Cleaner DatetimeIndex string representation with 1 or 2 elements (GH1611)
Improve performance of array-of-Period to PeriodIndex, convert such arraysto PeriodIndex inside Index (GH1215)
More informative string representation for weekly Period objects (GH1503)
Accelerate 3-axis multi data selection from homogeneous Panel (GH979)
Addadjust option to ewma to disable adjustment factor (GH1584)
Add new matplotlib converters for high frequency time series plotting (GH1599)
Handling of tz-aware datetime.datetime objects in to_datetime; raiseException unless utc=True given (GH1581)

Bug Fixes¶

Fix NA handling in DataFrame.to_panel (GH1582)
Handle TypeError issues inside PyObject_RichCompareBool calls in khash(GH1318)
Fix resampling bug to lower case daily frequency (GH1588)
Fix kendall/spearman DataFrame.corr bug with no overlap (GH1595)
Fix bug in DataFrame.set_index (GH1592)
Don’t ignore axes in boxplot if by specified (GH1565)
Fix Panel .ix indexing with integers bug (GH1603)
Fix Partial indexing bugs (years, months, ...) with PeriodIndex (GH1601)
Fix MultiIndex console formatting issue (GH1606)
Unordered index with duplicates doesn’t yield scalar location for singleentry (GH1586)
Fix resampling of tz-aware time series with “anchored” freq (GH1591)
Fix DataFrame.rank error on integer data (GH1589)
Selection of multiple SparseDataFrame columns by list in __getitem__ (GH1585)
Override Index.tolist for compatibility with MultiIndex (GH1576)
Fix hierarchical summing bug with MultiIndex of length 1 (GH1568)
Work around numpy.concatenate use/bug in Series.set_value (GH1561)
Ensure Series/DataFrame are sorted before resampling (GH1580)
Fix unhandled IndexError when indexing very large time series (GH1562)
Fix DatetimeIndex intersection logic error with irregular indexes (GH1551)
Fix unit test errors on Python 3 (GH1550)
Fix .ix indexing bugs in duplicate DataFrame index (GH1201)
Better handle errors with non-existing objects in HDFStore (GH1254)
Don’t copy int64 array data in DatetimeIndex when copy=False (GH1624)
Fix resampling of conforming periods quarterly to annual (GH1622)
Don’t lose index name on resampling (GH1631)
Support python-dateutil version 2.1 (GH1637)
Fix broken scatter_matrix axis labeling, esp. with time series (GH1625)
Fix cases where extra keywords weren’t being passed on to matplotlib fromSeries.plot (GH1636)
Fix BusinessMonthBegin logic for dates before 1st bday of month (GH1645)
Ensure string alias converted (valid in DatetimeIndex.get_loc) inDataFrame.xs / __getitem__ (GH1644)
Fix use of string alias timestamps with tz-aware time series (GH1647)
Fix Series.max/min and Series.describe on len-0 series (GH1650)
Handle None values in dict passed to concat (GH1649)
Fix Series.interpolate with method=’values’ and DatetimeIndex (GH1646)
Fix IndexError in left merges on a DataFrame with 0-length (GH1628)
Fix DataFrame column width display with UTF-8 encoded characters (GH1620)
Handle case in pandas.io.data.get_data_yahoo where Yahoo! returns duplicatedates for most recent business day
Avoid downsampling when plotting mixed frequencies on the same subplot (GH1619)
Fix read_csv bug when reading a single line (GH1553)
Fix bug in C code causing monthly periods prior to December 1969 to be off (GH1570)

pandas 0.8.0¶

Release date: 6/29/2012

New Features¶

New unified DatetimeIndex class for nanosecond-level timestamp data
New Timestamp datetime.datetime subclass with easy time zone conversions,and support for nanoseconds
New PeriodIndex class for timespans, calendar logic, and Period scalar object
High performance resampling of timestamp and period data. Newresamplemethod of all pandas data structures
New frequency names plus shortcut string aliases like ‘15h’, ‘1h30min’
Time series string indexing shorthand (GH222)
Add week, dayofyear array and other timestamp array-valued field accessorfunctions to DatetimeIndex
Add GroupBy.prod optimized aggregation function and ‘prod’ fast time seriesconversion method (GH1018)
Implement robust frequency inference function andinferred_freq attributeon DatetimeIndex (GH391)
Newtz_convert andtz_localize methods in Series / DataFrame
Convert DatetimeIndexes to UTC if time zones are different in join/setops(GH864)
Add limit argument for forward/backward filling to reindex, fillna,etc. (GH825 and others)
Add support for indexes (dates or otherwise) with duplicates and commonsense indexing/selection functionality
Series/DataFrame.update methods, in-place variant of combine_first (GH961)
Addmatch function to API (GH502)
Add Cython-optimized first, last, min, max, prod functions to GroupBy (GH994,GH1043)
Dates can be split across multiple columns (GH1227,GH1186)
Add experimental support for converting pandas DataFrame to R data.framevia rpy2 (GH350,GH1212)
Can pass list of (name, function) to GroupBy.aggregate to get aggregates ina particular order (GH610)
Can pass dicts with lists of functions or dicts to GroupBy aggregate to domuch more flexible multiple function aggregation (GH642,GH610)
New ordered_merge functions for merging DataFrames with ordereddata. Also supports group-wise merging for panel data (GH813)
Add keys() method to DataFrame
Add flexible replace method for replacing potentially values to Series andDataFrame (GH929,GH1241)
Add ‘kde’ plot kind for Series/DataFrame.plot (GH1059)
More flexible multiple function aggregation with GroupBy
Add pct_change function to Series/DataFrame
Add option to interpolate by Index values in Series.interpolate (GH1206)
Addmax_colwidth option for DataFrame, defaulting to 50
Conversion of DataFrame through rpy2 to R data.frame (GH1282, )
Add keys() method on DataFrame (GH1240)
Add newmatch function to API (similar to R) (GH502)
Add dayfirst option to parsers (GH854)
Addmethod argument toalign method for forward/backward fillin(GH216)
Add Panel.transpose method for rearranging axes (GH695)
Add newcut function (patterned after R) for discretizing data intoequal range-length bins or arbitrary breaks of your choosing (GH415)
Add newqcut for cutting with quantiles (GH1378)
Addvalue_counts top level array method (GH1392)
Added Andrews curves plot tupe (GH1325)
Add lag plot (GH1440)
Add autocorrelation_plot (GH1425)
Add support for tox and Travis CI (GH1382)
Add support for Categorical use in GroupBy (GH292)
Addany andall methods to DataFrame (GH1416)
Addsecondary_y option to Series.plot
Add experimentallreshape function for reshaping wide to long

Improvements to existing features¶

Switch to klib/khash-based hash tables in Index classes for betterperformance in many cases and lower memory footprint
Shipping some functions from scipy.stats to reduce dependency,e.g. Series.describe and DataFrame.describe (GH1092)
Can create MultiIndex by passing list of lists or list of arrays to Series,DataFrame constructor, etc. (GH831)
Can pass arrays in addition to column names to DataFrame.set_index (GH402)
Improve the speed of “square” reindexing of homogeneous DataFrame objectsby significant margin (GH836)
Handle more dtypes when passed MaskedArrays in DataFrame constructor (GH406)
Improved performance of join operations on integer keys (GH682)
Can pass multiple columns to GroupBy object, e.g. grouped[[col1, col2]] toonly aggregate a subset of the value columns (GH383)
Add histogram / kde plot options for scatter_matrix diagonals (GH1237)
Add inplace option to Series/DataFrame.rename and sort_index,DataFrame.drop_duplicates (GH805,GH207)
More helpful error message when nothing passed to Series.reindex (GH1267)
Can mix array and scalars as dict-value inputs to DataFrame ctor (GH1329)
Use DataFrame columns’ name for legend title in plots
Preserve frequency in DatetimeIndex when possible in boolean indexingoperations
Promote datetime.date values in data alignment operations (GH867)
Addorder method to Index classes (GH1028)
Avoid hash table creation in large monotonic hash table indexes (GH1160)
Store time zones in HDFStore (GH1232)
Enable storage of sparse data structures in HDFStore (GH85)
Enable Series.asof to work with arrays of timestamp inputs
Cython implementation of DataFrame.corr speeds up by > 100x (GH1349,GH1354)
Exclude “nuisance” columns automatically in GroupBy.transform (GH1364)
Support functions-as-strings in GroupBy.transform (GH1362)
Use index name as xlabel/ylabel in plots (GH1415)
Addconvert_dtype option to Series.apply to be able to leave data asdtype=object (GH1414)
Can specify all index level names in concat (GH1419)
Adddialect keyword to parsers for quoting conventions (GH1363)
Enable DataFrame[bool_DataFrame] += value (GH1366)
Addretries argument toget_data_yahoo to try to prevent Yahoo! API404s (GH826)
Improve performance of reshaping by using O(N) categorical sorting
Series names will be used for index of DataFrame if no index passed (GH1494)
Header argument in DataFrame.to_csv can accept a list of column names touse instead of the object’s columns (GH921)
Addraise_conflict argument to DataFrame.update (GH1526)
Support file-like objects in ExcelFile (GH1529)

API Changes¶

Renamepandas._tseries topandas.lib
Rename Factor to Categorical and add improvements. Numerous Categorical bugfixes
Frequency name overhaul, WEEKDAY/EOM and rules with @deprecated. get_legacy_offset_name backwards compatibility function added
Raise ValueError in DataFrame.__nonzero__, so “if df” no longer works(GH1073)
Change BDay (business day) to not normalize dates by default (GH506)
Remove deprecated DataMatrix name
Default merge suffixes for overlap now have underscores instead of periodsto facilitate tab completion, etc. (GH1239)
Deprecation of offset, time_rule timeRule parameters throughout codebase
Series.append and DataFrame.append no longer check for duplicate indexesby default, add verify_integrity parameter (GH1394)
Refactor Factor class, old constructor moved to Factor.from_array
Modified internals of MultiIndex to use less memory (no longer representedas array of tuples) internally, speed up construction time and many methodswhich construct intermediate hierarchical indexes (GH1467)

Bug Fixes¶

Fix OverflowError from storing pre-1970 dates in HDFStore by switching todatetime64 (GH179)
Fix logical error with February leap year end in YearEnd offset
Series([False, nan]) was getting casted to float64 (GH1074)
Fix binary operations between boolean Series and object Series withbooleans and NAs (GH1074,GH1079)
Couldn’t assign whole array to column in mixed-type DataFrame via .ix(GH1142)
Fix label slicing issues with float index values (GH1167)
Fix segfault caused by empty groups passed to groupby (GH1048)
Fix occasionally misbehaved reindexing in the presence of NaN labels (GH522)
Fix imprecise logic causing weird Series results from .apply (GH1183)
Unstack multiple levels in one shot, avoiding empty columns in somecases. Fix pivot table bug (GH1181)
Fix formatting of MultiIndex on Series/DataFrame when index name coincideswith label (GH1217)
Handle Excel 2003 #N/A as NaN from xlrd (GH1213,GH1225)
Fix timestamp locale-related deserialization issues with HDFStore by movingto datetime64 representation (GH1081,GH809)
Fix DataFrame.duplicated/drop_duplicates NA value handling (GH557)
Actually raise exceptions in fast reducer (GH1243)
Fix various timezone-handling bugs from 0.7.3 (GH969)
GroupBy on level=0 discarded index name (GH1313)
Better error message with unmergeable DataFrames (GH1307)
Series.__repr__ alignment fix with unicode index values (GH1279)
Better error message if nothing passed to reindex (GH1267)
More robust NA handling in DataFrame.drop_duplicates (GH557)
Resolve locale-based and pre-epoch HDF5 timestamp deserialization issues(GH973,GH1081,GH179)
Implement Series.repeat (GH1229)
Fix indexing with namedtuple and other tuple subclasses (GH1026)
Fix float64 slicing bug (GH1167)
Parsing integers with commas (GH796)
Fix groupby improper data type when group consists of one value (GH1065)
Fix negative variance possibility in nanvar resulting from floating pointerror (GH1090)
Consistently set name on groupby pieces (GH184)
Treat dict return values as Series in GroupBy.apply (GH823)
Respect column selection for DataFrame in in GroupBy.transform (GH1365)
Fix MultiIndex partial indexing bug (GH1352)
Enable assignment of rows in mixed-type DataFrame via .ix (GH1432)
Reset index mapping when grouping Series in Cython (GH1423)
Fix outer/inner DataFrame.join with non-unique indexes (GH1421)
Fix MultiIndex groupby bugs with empty lower levels (GH1401)
Calling fillna with a Series will have same behavior as with dict (GH1486)
SparseSeries reduction bug (GH1375)
Fix unicode serialization issue in HDFStore (GH1361)
Pass keywords to pyplot.boxplot in DataFrame.boxplot (GH1493)
Bug fixes in MonthBegin (GH1483)
Preserve MultiIndex names in drop (GH1513)
Fix Panel DataFrame slice-assignment bug (GH1533)
Don’t use locals() in read_* functions (GH1547)

pandas 0.7.3¶

Release date: April 12, 2012

New Features¶

Support for non-unique indexes: indexing and selection, many-to-one andmany-to-many joins (GH1306)
Added fixed-width file reader, read_fwf (GH952)
Add group_keys argument to groupby to not add group names to MultiIndex inresult of apply (GH938)
DataFrame can now accept non-integer label slicing (GH946). Previouslyonly DataFrame.ix was able to do so.
DataFrame.apply now retains name attributes on Series objects (GH983)
Numeric DataFrame comparisons with non-numeric values now raises properTypeError (GH943). Previously raise “PandasError: DataFrame constructornot properly called!”
Addkurt methods to Series and DataFrame (GH964)
Can pass dict of column -> list/set NA values for text parsers (GH754)
Allows users specified NA values in text parsers (GH754)
Parsers checks for openpyxl dependency and raises ImportError if not found(GH1007)
New factory function to create HDFStore objects that can be used in a withstatement so users do not have to explicitly call HDFStore.close (GH1005)
pivot_table is now more flexible with same parameters as groupby (GH941)
Added stacked bar plots (GH987)
scatter_matrix method in pandas/tools/plotting.py (GH935)
DataFrame.boxplot returns plot results for ex-post styling (GH985)
Short version number accessible as pandas.version.short_version (GH930)
Additional documentation in panel.to_frame (GH942)
More informative Series.apply docstring regarding element-wise apply(GH977)
Notes on rpy2 installation (GH1006)
Add rotation and font size options to hist method (GH1012)
Use exogenous / X variable index in result of OLS.y_predict. AddOLS.predict method (GH1027,GH1008)

API Changes¶

Calling apply on grouped Series, e.g. describe(), will no longer yieldDataFrame by default. Will have to call unstack() to get prior behavior
NA handling in non-numeric comparisons has been tightened up (GH933,GH953)
No longer assign dummy names key_0, key_1, etc. to groupby index (GH1291)

Bug Fixes¶

Fix logic error when selecting part of a row in a DataFrame with aMultiIndex index (GH1013)
Series comparison with Series of differing length causes crash (GH1016).
Fix bug in indexing when selecting section of hierarchically-indexed row(GH1013)
DataFrame.plot(logy=True) has no effect (GH1011).
Broken arithmetic operations between SparsePanel-Panel (GH1015)
Unicode repr issues in MultiIndex with non-ASCII characters (GH1010)
DataFrame.lookup() returns inconsistent results if exact match not present(GH1001)
DataFrame arithmetic operations not treating None as NA (GH992)
DataFrameGroupBy.apply returns incorrect result (GH991)
Series.reshape returns incorrect result for multiple dimensions (GH989)
Series.std and Series.var ignores ddof parameter (GH934)
DataFrame.append loses index names (GH980)
DataFrame.plot(kind=’bar’) ignores color argument (GH958)
Inconsistent Index comparison results (GH948)
Improper int dtype DataFrame construction from data with NaN (GH846)
Removes default ‘result’ name in groupby results (GH995)
DataFrame.from_records no longer mutate input columns (GH975)
Use Index name when grouping by it (GH1313)

pandas 0.7.2¶

Release date: March 16, 2012

New Features¶

Add additional tie-breaking methods in DataFrame.rank (GH874)
Add ascending parameter to rank in Series, DataFrame (GH875)
Add sort_columns parameter to allow unsorted plots (GH918)
IPython tab completion on GroupBy objects

API Changes¶

Series.sum returns 0 instead of NA when called on an emptyseries. Analogously for a DataFrame whose rows or columns are length 0(GH844)

Improvements to existing features¶

Don’t use groups dict in Grouper.size (GH860)
Use khash for Series.value_counts, add raw function to algorithms.py (GH861)
Enable column access via attributes on GroupBy (GH882)
Enable setting existing columns (only) via attributes on DataFrame, Panel(GH883)
Intercept __builtin__.sum in groupby (GH885)
Can pass dict to DataFrame.fillna to use different values per column (GH661)
Can select multiple hierarchical groups by passing list of values in .ix(GH134)
Add level keyword todrop for dropping values from a level (GH159)
Addcoerce_float option on DataFrame.from_records (GH893)
Raise exception if passed date_parser fails inread_csv
Addaxis option to DataFrame.fillna (GH174)
Fixes to Panel to make it easier to subclass (GH888)

Bug Fixes¶

Fix overflow-related bugs in groupby (GH850,GH851)
Fix unhelpful error message in parsers (GH856)
Better err msg for failed boolean slicing of dataframe (GH859)
Series.count cannot accept a string (level name) in the level argument (GH869)
Group index platform int check (GH870)
concat on axis=1 and ignore_index=True raises TypeError (GH871)
Further unicode handling issues resolved (GH795)
Fix failure in multiindex-based access in Panel (GH880)
Fix DataFrame boolean slice assignment failure (GH881)
Fix combineAdd NotImplementedError for SparseDataFrame (GH887)
Fix DataFrame.to_html encoding and columns (GH890,GH891,GH909)
Fix na-filling handling in mixed-type DataFrame (GH910)
Fix to DataFrame.set_value with non-existant row/col (GH911)
Fix malformed block in groupby when excluding nuisance columns (GH916)
Fix inconsistant NA handling in dtype=object arrays (GH925)
Fix missing center-of-mass computation in ewmcov (GH862)
Don’t raise exception when opening read-only HDF5 file (GH847)
Fix possible out-of-bounds memory access in 0-length Series (GH917)

pandas 0.7.1¶

Release date: February 29, 2012

New Features¶

Addto_clipboard function to pandas namespace for writing objects tothe system clipboard (GH774)
Additertuples method to DataFrame for iterating through the rows of adataframe as tuples (GH818)
Add ability to pass fill_value and method to DataFrame and Series alignmethod (GH806,GH807)
Add fill_value option to reindex, align methods (GH784)
Enable concat to produce DataFrame from Series (GH787)
Addbetween method to Series (GH802)
Add HTML representation hook to DataFrame for the IPython HTML notebook(GH773)
Support for reading Excel 2007 XML documents using openpyxl

Improvements to existing features¶

Improve performance and memory usage of fillna on DataFrame
Can concatenate a list of Series along axis=1 to obtain a DataFrame (GH787)

Bug Fixes¶

Fix memory leak when inserting large number of columns into a singleDataFrame (GH790)
Appending length-0 DataFrame with new columns would not result in those newcolumns being part of the resulting concatenated DataFrame (GH782)
Fixed groupby corner case when passing dictionary grouper and as_index isFalse (GH819)
Fixed bug whereby bool array sometimes had object dtype (GH820)
Fix exception thrown on np.diff (GH816)
Fix to_records where columns are non-strings (GH822)
Fix Index.intersection where indices have incomparable types (GH811)
Fix ExcelFile throwing an exception for two-line file (GH837)
Add clearer error message in csv parser (GH835)
Fix loss of fractional seconds in HDFStore (GH513)
Fix DataFrame join where columns have datetimes (GH787)
Work around numpy performance issue in take (GH817)
Improve comparison operations for NA-friendliness (GH801)
Fix indexing operation for floating point values (GH780,GH798)
Fix groupby case resulting in malformed dataframe (GH814)
Fix behavior of reindex of Series dropping name (GH812)
Improve on redudant groupby computation (GH775)
Catch possible NA assignment to int/bool series with exception (GH839)

pandas 0.7.0¶

Release date: 2/9/2012

New Features¶

Newmerge function for efficiently performing full gamut of database /relational-algebra operations. Refactored existing join methods to use thenew infrastructure, resulting in substantial performance gains (GH220,GH249,GH267)
Newconcat function for concatenating DataFrame or Panel objects alongan axis. Can form union or intersection of the other axes. Improvesperformance ofDataFrame.append (GH468,GH479,GH273)
Handle differently-indexed output values inDataFrame.apply (GH498)
Can pass list of dicts (e.g., a list of shallow JSON objects) to DataFrameconstructor (GH526)
Addreorder_levels method to Series and DataFrame (GH534)
Add dict-likeget function to DataFrame and Panel (GH521)
DataFrame.iterrows method for efficiently iterating through the rows ofa DataFrame
AddedDataFrame.to_panel with code adapted fromLongPanel.to_long
reindex_axis method added to DataFrame
Addlevel option to binary arithmetic functions onDataFrame andSeries
Addlevel option to thereindex andalign methods on Series andDataFrame for broadcasting values across a level (GH542,GH552, others)
Add attribute-based item access toPanel and add IPython completion (PRGH554)
Addlogy option toSeries.plot for log-scaling on the Y axis
Addindex,header, andjustify options toDataFrame.to_string. Add option to (GH570,GH571)
Can pass multiple DataFrames toDataFrame.join to join on index (GH115)
Can pass multiple Panels toPanel.join (GH115)
Can pass multiple DataFrames toDataFrame.append to concatenate (stack)and multiple Series toSeries.append too
Addedjustify argument toDataFrame.to_string to allow differentalignment of column headers
Addsort option to GroupBy to allow disabling sorting of the group keysfor potential speedups (GH595)
Can pass MaskedArray to Series constructor (GH563)
Add Panel item access via attributes and IPython completion (GH554)
ImplementDataFrame.lookup, fancy-indexing analogue for retrievingvalues given a sequence of row and column labels (GH338)
Addverbose option toread_csv andread_table to show number ofNA values inserted in non-numeric columns (GH614)
Can pass a list of dicts or Series toDataFrame.append to concatenatemultiple rows (GH464)
Addlevel argument toDataFrame.xs for selecting data from otherMultiIndex levels. Can take one or more levels with potentially a tuple ofkeys for flexible retrieval of data (GH371,GH629)
Newcrosstab function for easily computing frequency tables (GH170)
Can pass a list of functions to aggregate with groupby on a DataFrame,yielding an aggregated result with hierarchical columns (GH166)
Add integer-indexing functionsiget in Series andirow /igetin DataFrame (GH628)
Add newSeries.unique function, significantly faster thannumpy.unique (GH658)
Add newcummin andcummax instance methods toSeries andDataFrame (GH647)
Add newvalue_range function to return min/max of a dataframe (GH288)
Adddrop parameter toreset_index method ofDataFrame and addedmethod toSeries as well (GH699)
Addisin method to Index objects, works just likeSeries.isin (GHGH657)
Implement array interface on Panel so that ufuncs work (re:GH740)
Addsort option toDataFrame.join (GH731)
Improved handling of NAs (propagation) in binary operations withdtype=object arrays (GH737)
Addabs method to Pandas objects
Addedalgorithms module to start collecting central algos

API Changes¶

Label-indexing with integer indexes now raises KeyError if a label is notfound instead of falling back on location-based indexing (GH700)
Label-based slicing viaix or[] on Series will now only work ifexact matches for the labels are found or if the index is monotonic (forrange selections)
Label-based slicing and sequences of labels can be passed to[] on aSeries for both getting and setting (GH86)
[] operator (__getitem__ and__setitem__) will raise KeyErrorwith integer indexes when an index is not contained in the index. The priorbehavior would fall back on position-based indexing if a key was not foundin the index which would lead to subtle bugs. This is now consistent withthe behavior of.ix on DataFrame and friends (GH328)
RenameDataFrame.delevel toDataFrame.reset_index and adddeprecation warning
Series.sort (an in-place operation) called on a Series which is a view ona larger array (e.g. a column in a DataFrame) will generate an Exception toprevent accidentally modifying the data source (GH316)
Refactor to remove deprecatedLongPanel class (GH552)
DeprecatedPanel.to_long, renamed toto_frame
DeprecatedcolSpace argument inDataFrame.to_string, renamed tocol_space
Renameprecision toaccuracy in engineering float formatter (GHGH395)
The default delimiter forread_csv is comma rather than lettingcsv.Sniffer infer it
Renamecol_or_columns argument inDataFrame.drop_duplicates (GHGH734)

Improvements to existing features¶

Better error message in DataFrame constructor when passed column labelsdon’t match data (GH497)
Substantially improve performance of multi-GroupBy aggregation when aPython function is passed, reuse ndarray object in Cython (GH496)
Can store objects indexed by tuples and floats in HDFStore (GH492)
Don’t print length by default in Series.to_string, addlength option (GHGH489)
Improve Cython code for multi-groupby to aggregate without having to sortthe data (GH93)
Improve MultiIndex reindexing speed by storing tuples in the MultiIndex,test for backwards unpickling compatibility
Improve column reindexing performance by using specialized Cython takefunction
Further performance tweaking of Series.__getitem__ for standard use cases
Avoid Index dict creation in some cases (i.e. when getting slices, etc.),regression from prior versions
Friendlier error message in setup.py if NumPy not installed
Use common set of NA-handling operations (sum, mean, etc.) in Panel classalso (GH536)
Default name assignment when callingreset_index on DataFrame with aregular (non-hierarchical) index (GH476)
Use Cythonized groupers when possible in Series/DataFrame stat ops withlevel parameter passed (GH545)
Ported skiplist data structure to C to speed uprolling_median by about5-10x in most typical use cases (GH374)
Some performance enhancements in constructing a Panel from a dict ofDataFrame objects
MadeIndex._get_duplicates a public method by removing the underscore
Prettier printing of floats, and column spacing fix (GH395,GH571)
Addbold_rows option to DataFrame.to_html (GH586)
Improve the performance ofDataFrame.sort_index by up to 5x or morewhen sorting by multiple columns
Substantially improve performance of DataFrame and Series constructors whenpassed a nested dict or dict, respectively (GH540,GH621)
Modified setup.py so that pip / setuptools will install dependencies (GHGH507, various pull requests)
Unstack called on DataFrame with non-MultiIndex will return Series (GHGH477)
Improve DataFrame.to_string and console formatting to be more consistent inthe number of displayed digits (GH395)
Use bottleneck if available for performing NaN-friendly statisticaloperations that it implemented (GH91)
Monkey-patch context to traceback inDataFrame.apply to indicate whichrow/column the function application failed on (GH614)
Improved ability of read_table and read_clipboard to parseconsole-formatted DataFrames (can read the row of index names, etc.)
Can pass list of group labels (without having to convert to an ndarrayyourself) togroupby in some cases (GH659)
Usekind argument to Series.order for selecting different sort kinds(GH668)
Add option to Series.to_csv to omit the index (GH684)
Adddelimiter as an alternative tosep inread_csv and otherparsing functions
Substantially improved performance of groupby on DataFrames with manycolumns by aggregating blocks of columns all at once (GH745)
Can pass a file handle or StringIO to Series/DataFrame.to_csv (GH765)
Can pass sequence of integers to DataFrame.irow(icol) and Series.iget, (GHGH654)
Prototypes for some vectorized string functions
Add float64 hash table to solve the Series.unique problem with NAs (GH714)
Memoize objects when reading from file to reduce memory footprint
Can get and set a column of a DataFrame with hierarchical columnscontaining “empty” (‘’) lower levels without passing the empty levels (PRGH768)

Bug Fixes¶

Raise exception in out-of-bounds indexing of Series instead ofseg-faulting, regression from earlier releases (GH495)
Fix error when joining DataFrames of different dtypes within the sametypeclass (e.g. float32 and float64) (GH486)
Fix bug in Series.min/Series.max on objects like datetime.datetime (GHGH487)
Preserve index names in Index.union (GH501)
Fix bug in Index joining causing subclass information (like DateRange type)to be lost in some cases (GH500)
Accept empty list as input to DataFrame constructor, regression from 0.6.0(GH491)
Can output DataFrame and Series with ndarray objects in a dtype=objectarray (GH490)
Return empty string from Series.to_string when called on empty Series (GHGH488)
Fix exception passing empty list to DataFrame.from_records
Fix Index.format bug (excluding name field) with datetimes with time info
Fix scalar value access in Series to always return NumPy scalars,regression from prior versions (GH510)
Handle rows skipped at beginning of file in read_* functions (GH505)
Handle improper dtype casting inset_value methods
Unary ‘-‘ / __neg__ operator on DataFrame was returning integer values
Unbox 0-dim ndarrays from certain operators like all, any in Series
Fix handling of missing columns (was combine_first-specific) inDataFrame.combine for general case (GH529)
Fix type inference logic with boolean lists and arrays in DataFrame indexing
Use centered sum of squares in R-square computation if entity_effects=Truein panel regression
Handle all NA case in Series.{corr, cov}, was raising exception (GH548)
Aggregating by multiple levels withlevel argument to DataFrame, Seriesstat method, was broken (GH545)
Fix Cython buf when converter passed to read_csv produced a numeric array(buffer dtype mismatch when passed to Cython type inference function) (GHGH546)
Fix exception when setting scalar value using .ix on a DataFrame with aMultiIndex (GH551)
Fix outer join between two DateRanges with different offsets that returnedan invalid DateRange
Cleanup DataFrame.from_records failure where index argument is an integer
Fix Data.from_records failure when passed a dictionary
Fix NA handling in {Series, DataFrame}.rank with non-floating point dtypes
Fix bug related to integer type-checking in .ix-based indexing
Handle non-string index name passed to DataFrame.from_records
DataFrame.insert caused the columns name(s) field to be discarded (GH527)
Fix erroneous in monotonic many-to-one left joins
Fix DataFrame.to_string to remove extra column white space (GH571)
Format floats to default to same number of digits (GH395)
Added decorator to copy docstring from one function to another (GH449)
Fix error in monotonic many-to-one left joins
Fix __eq__ comparison between DateOffsets with different relativedeltakeywords passed
Fix exception caused by parser converter returning strings (GH583)
Fix MultiIndex formatting bug with integer names (GH601)
Fix bug in handling of non-numeric aggregates in Series.groupby (GH612)
Fix TypeError with tuple subclasses (e.g. namedtuple) inDataFrame.from_records (GH611)
Catch misreported console size when running IPython within Emacs
Fix minor bug in pivot table margins, loss of index names and length-1‘All’ tuple in row labels
Add support for legacy WidePanel objects to be read from HDFStore
Fix out-of-bounds segfault in pad_object and backfill_object methods wheneither source or target array are empty
Could not create a new column in a DataFrame from a list of tuples
Fix bugs preventing SparseDataFrame and SparseSeries working with groupby(GH666)
Use sort kind in Series.sort / argsort (GH668)
Fix DataFrame operations on non-scalar, non-pandas objects (GH672)
Don’t convert DataFrame column to integer type when passing integer to__setitem__ (GH669)
Fix downstream bug in pivot_table caused by integer level names inMultiIndex (GH678)
Fix SparseSeries.combine_first when passed a dense Series (GH687)
Fix performance regression in HDFStore loading when DataFrame or Panelstored in table format with datetimes
Raise Exception in DateRange when offset with n=0 is passed (GH683)
Fix get/set inconsistency with .ix property and integer location butnon-integer index (GH707)
Use right dropna function for SparseSeries. Return dense Series for NA fillvalue (GH730)
Fix Index.format bug causing incorrectly string-formatted Series withdatetime indexes (GH726,GH758)
Fix errors caused by object dtype arrays passed to ols (GH759)
Fix error where column names lost when passing list of labels toDataFrame.__getitem__, (GH662)
Fix error whereby top-level week iterator overwrote week instance
Fix circular reference causing memory leak in sparse array / series /frame, (GH663)
Fix integer-slicing from integers-as-floats (GH670)
Fix zero division errors in nanops from object dtype arrays in all NA case(GH676)
Fix csv encoding when using unicode (GH705,GH717,GH738)
Fix assumption that each object contains every unique block type in concat,(GH708)
Fix sortedness check of multiindex in to_panel (GH719, 720)
Fix that None was not treated as NA in PyObjectHashtable
Fix hashing dtype because of endianness confusion (GH747,GH748)
Fix SparseSeries.dropna to return dense Series in case of NA fill value (GHGH730)
Use map_infer instead of np.vectorize. handle NA sentinels if converteryields numeric array, (GH753)
Fixes and improvements to DataFrame.rank (GH742)
Fix catching AttributeError instead of NameError for bottleneck
Try to cast non-MultiIndex to better dtype when calling reset_index (GH726 GH440)
Fix #1.QNAN0’ float bug on 2.6/win64
Allow subclasses of dicts in DataFrame constructor, with tests
Fix problem whereby set_index destroys column multiindex (GH764)
Hack around bug in generating DateRange from naive DateOffset (GH770)
Fix bug in DateRange.intersection causing incorrect results with someoverlapping ranges (GH771)

Thanks¶

Craig Austin
Chris Billington
Marius Cobzarenco
Mario Gamboa-Cavazos
Hans-Martin Gaudecker
Arthur Gerigk
Yaroslav Halchenko
Jeff Hammerbacher
Matt Harrison
Andreas Hilboll
Luc Kesters
Adam Klein
Gregg Lind
Solomon Negusse
Wouter Overmeire
Christian Prinoth
Jeff Reback
Sam Reckoner
Craig Reeson
Jan Schulz
Skipper Seabold
Ted Square
Graham Taylor
Aman Thakral
Chris Uga
Dieter Vandenbussche
Texas P.
Pinxing Ye
... and everyone I forgot

pandas 0.6.1¶

Release date: 12/13/2011

API Changes¶

Renamenames argument in DataFrame.from_records tocolumns. Adddeprecation warning
Boolean get/set operations on Series with boolean Series will reindexinstead of requiring that the indexes be exactly equal (GH429)

New Features¶

Can pass Series to DataFrame.append with ignore_index=True for appending asingle row (GH430)
Add Spearman and Kendall correlation options to Series.corr andDataFrame.corr (GH428)
Add newget_value andset_value methods to Series, DataFrame, and Panelto very low-overhead access to scalar elements. df.get_value(row, column)is about 3x faster than df[column][row] by handling fewer cases (GH437,GH438). Add similar methods to sparse data structures for compatibility
Add Qt table widget to sandbox (GH435)
DataFrame.align can accept Series arguments, add axis keyword (GH461)
Implement new SparseList and SparseArray data structures. SparseSeries nowderives from SparseArray (GH463)
max_columns / max_rows options in set_printoptions (GH453)
Implement Series.rank and DataFrame.rank, fast versions ofscipy.stats.rankdata (GH428)
Implement DataFrame.from_items alternate constructor (GH444)
DataFrame.convert_objects method for inferring better dtypes for objectcolumns (GH302)
Add rolling_corr_pairwise function for computing Panel of correlationmatrices (GH189)
Addmargins option topivot_table for computing subgroup aggregates (GHGH114)
AddSeries.from_csv function (GH482)

Improvements to existing features¶

Improve memory usage ofDataFrame.describe (do not copy dataunnecessarily) (GH425)
Use same formatting function for outputting floating point Series to consoleas in DataFrame (GH420)
DataFrame.delevel will try to infer better dtype for new columns (GH440)
Exclude non-numeric types in DataFrame.{corr, cov}
Override Index.astype to enable dtype casting (GH412)
Use same float formatting function for Series.__repr__ (GH420)
Use available console width to output DataFrame columns (GH453)
Accept ndarrays when setting items in Panel (GH452)
Infer console width when printing __repr__ of DataFrame to console (PRGH453)
Optimize scalar value lookups in the general case by 25% or more in Seriesand DataFrame
Can pass DataFrame/DataFrame and DataFrame/Series torolling_corr/rolling_cov (GH462)
Fix performance regression in cross-sectional count in DataFrame, affectingDataFrame.dropna speed
Column deletion in DataFrame copies no data (computes views on blocks) (GHGH158)
MultiIndex.get_level_values can take the level name
More helpful error message when DataFrame.plot fails on one of the columns(GH478)
Improve performance of DataFrame.{index, columns} attribute lookup

Bug Fixes¶

Fix O(K^2) memory leak caused by inserting many columns withoutconsolidating, had been present since 0.4.0 (GH467)
DataFrame.count should return Series with zero instead of NA with length-0axis (GH423)
Fix Yahoo! Finance API usage in pandas.io.data (GH419,GH427)
Fix upstream bug causing failure in Series.align with empty Series (GH434)
Function passed to DataFrame.apply can return a list, as long as it’s theright length. Regression from 0.4 (GH432)
Don’t “accidentally” upcast scalar values when indexing using .ix (GH431)
Fix groupby exception raised with as_index=False and single column selected(GH421)
Implement DateOffset.__ne__ causing downstream bug (GH456)
Fix __doc__-related issue when converting py -> pyo with py2exe
Bug fix in left join Cython code with duplicate monotonic labels
Fix bug when unstacking multiple levels described inGH451
Exclude NA values in dtype=object arrays, regression from 0.5.0 (GH469)
Use Cython map_infer function in DataFrame.applymap to properly inferoutput type, handle tuple return values and other things that were breaking(GH465)
Handle floating point index values in HDFStore (GH454)
Fixed stale column reference bug (cached Series object) caused by typechange / item deletion in DataFrame (GH473)
Index.get_loc should always raise Exception when there are duplicates
Handle differently-indexed Series input to DataFrame constructor (GH475)
Omit nuisance columns in multi-groupby with Python function
Buglet in handling of single grouping in general apply
Handle type inference properly when passing list of lists or tuples toDataFrame constructor (GH484)
Preserve Index / MultiIndex names in GroupBy.apply concatenation step (GHGH481)

Thanks¶

Ralph Bean
Luca Beltrame
Marius Cobzarenco
Andreas Hilboll
Jev Kuznetsov
Adam Lichtenstein
Wouter Overmeire
Fernando Perez
Nathan Pinger
Christian Prinoth
Alex Reyfman
Joon Ro
Chang She
Ted Square
Chris Uga
Dieter Vandenbussche

pandas 0.6.0¶

Release date: 11/25/2011

API Changes¶

Arithmetic methods likesum will attempt to sum dtype=object values bydefault instead of excluding them (GH382)

New Features¶

Addmelt function topandas.core.reshape
Addlevel parameter to group by level in Series and DataFramedescriptive statistics (GH313)
Addhead andtail methods to Series, analogous to to DataFrame (PRGH296)
AddSeries.isin function which checks if each value is contained in apassed sequence (GH289)
Addfloat_format option toSeries.to_string
Addskip_footer (GH291) andconverters (GH343) options toread_csv andread_table
Add proper, tested weighted least squares to standard and panel OLS (GHGH303)
Adddrop_duplicates andduplicated functions for removing duplicateDataFrame rows and checking for duplicate rows, respectively (GH319)
Implement logical (boolean) operators&,|,^ on DataFrame(GH347)
AddSeries.mad, mean absolute deviation, matching DataFrame
AddQuarterEnd DateOffset (GH321)
Add matrix multiplication functiondot to DataFrame (GH65)
Addorient option toPanel.from_dict to ease creation of mixed-typePanels (GH359,GH301)
AddDataFrame.from_dict with similarorient option
Can now pass list of tuples or list of lists toDataFrame.from_recordsfor fast conversion to DataFrame (GH357)
Can pass multiple levels to groupby, e.g.df.groupby(level=[0, 1]) (GHGH103)
Can sort by multiple columns inDataFrame.sort_index (GH92,GH362)
Add fastget_value andput_value methods to DataFrame andmicro-performance tweaks (GH360)
Addcov instance methods to Series and DataFrame (GH194,GH362)
Add bar plot option toDataFrame.plot (GH348)
Addidxmin andidxmax functions to Series and DataFrame for computingindex labels achieving maximum and minimum values (GH286)
Addread_clipboard function for parsing DataFrame from OS clipboard,should work across platforms (GH300)
Addnunique function to Series for counting unique elements (GH297)
DataFrame constructor will use Series name if no columns passed (GH373)
Support regular expressions and longer delimiters in read_table/read_csv,but does not handle quoted strings yet (GH364)
AddDataFrame.to_html for formatting DataFrame to HTML (GH387)
MaskedArray can be passed to DataFrame constructor and masked values will beconverted to NaN (GH396)
AddDataFrame.boxplot function (GH368, others)
Can pass extra args, kwds to DataFrame.apply (GH376)

Improvements to existing features¶

Raise more helpful exception if date parsing fails in DateRange (GH298)
Vastly improved performance of GroupBy on axes with a MultiIndex (GH299)
Print level names in hierarchical index in Series repr (GH305)
Return DataFrame when performing GroupBy on selected column andas_index=False (GH308)
Can pass vector toon argument inDataFrame.join (GH312)
Don’t show Series name if it’s None in the repr, also omit length for shortSeries (GH317)
Show legend by default inDataFrame.plot, addlegend boolean flag (GHGH324)
Significantly improved performance ofSeries.order, which also makesnp.unique called on a Series faster (GH327)
Faster cythonized count by level in Series and DataFrame (GH341)
Raise exception if dateutil 2.0 installed on Python 2.x runtime (GH346)
Significant GroupBy performance enhancement with multiple keys with many“empty” combinations
New Cython vectorized functionmap_infer speeds upSeries.apply andSeries.map significantly when passed elementwise Python function,motivated byGH355
Cythonizedcache_readonly, resulting in substantial micro-performanceenhancements throughout the codebase (GH361)
Special Cython matrix iterator for applying arbitrary reduction operationswith 3-5x better performance thannp.apply_along_axis (GH309)
Addraw option toDataFrame.apply for getting better performance whenthe passed function only requires an ndarray (GH309)
Improve performance ofMultiIndex.from_tuples
Can pass multiple levels tostack andunstack (GH370)
Can pass multiple values columns topivot_table (GH381)
Can callDataFrame.delevel with standard Index with name set (GH393)
Use Series name in GroupBy for result index (GH363)
Refactor Series/DataFrame stat methods to use common set of NaN-friendlyfunction
Handle NumPy scalar integers at C level in Cython conversion routines

Bug Fixes¶

Fix bug inDataFrame.to_csv when writing a DataFrame with an indexname (GH290)
DataFrame should clear its Series caches on consolidation, was causing“stale” Series to be returned in some corner cases (GH304)
DataFrame constructor failed if a column had a list of tuples (GH293)
Ensure thatSeries.apply always returns a Series and implementSeries.round (GH314)
Support boolean columns in Cythonized groupby functions (GH315)
DataFrame.describe should not fail if there are no numeric columns,instead return categorical describe (GH323)
Fixed bug which could cause columns to be printed in wrong order inDataFrame.to_string if specific list of columns passed (GH325)
Fix legend plotting failure if DataFrame columns are integers (GH326)
Shift start date back by one month for Yahoo! Finance API in pandas.io.data(GH329)
FixDataFrame.join failure on unconsolidated inputs (GH331)
DataFrame.min/max will no longer fail on mixed-type DataFrame (GH337)
Fixread_csv /read_table failure when passing list to index_col that isnot in ascending order (GH349)
Fix failure passing Int64Index to Index.union when both are monotonic
Fix error when passing SparseSeries to (dense) DataFrame constructor
Added missing bang at top of setup.py (GH352)
Changeis_monotonic on MultiIndex so it properly compares the tuples
Fix MultiIndex outer join logic (GH351)
Set index name attribute with single-key groupby (GH358)
Bug fix in reflexive binary addition in Series and DataFrame fornon-commutative operations (like string concatenation) (GH353)
setupegg.py will invoke Cython (GH192)
Fix block consolidation bug after inserting column into MultiIndex (GH366)
Fix bug in join operations between Index and Int64Index (GH367)
Handle min_periods=0 case in moving window functions (GH365)
Fixed corner cases in DataFrame.apply/pivot with empty DataFrame (GH378)
Fixed repr exception when Series name is a tuple
Always return DateRange fromasfreq (GH390)
Pass level names toswaplavel (GH379)
Don’t lose index names inMultiIndex.droplevel (GH394)
Infer more proper return type inDataFrame.apply when no columns or rowsdepending on whether the passed function is a reduction (GH389)
Always return NA/NaN from Series.min/max and DataFrame.min/max when all of arow/column/values are NA (GH384)
Enable partial setting with .ix / advanced indexing (GH397)
Handle mixed-type DataFrames correctly in unstack, do not lose typeinformation (GH403)
Fix integer name formatting bug in Index.format and in Series.__repr__
Handle label types other than string passed to groupby (GH405)
Fix bug in .ix-based indexing with partial retrieval when a label is notcontained in a level
Index name was not being pickled (GH408)
Level name should be passed to result index in GroupBy.apply (GH416)

Thanks¶

Craig Austin
Marius Cobzarenco
Joel Cross
Jeff Hammerbacher
Adam Klein
Thomas Kluyver
Jev Kuznetsov
Kieran O’Mahony
Wouter Overmeire
Nathan Pinger
Christian Prinoth
Skipper Seabold
Chang She
Ted Square
Aman Thakral
Chris Uga
Dieter Vandenbussche
carljv
rsamson

pandas 0.5.0¶

Release date: 10/24/2011

This release of pandas includes a number of API changes (see below) and cleanup of deprecated APIsfrom pre-0.4.0 releases. There are also bug fixes, new features, numerous significant performance enhancements, and includes a new ipythoncompleter hook to enable tab completion of DataFrame columns accesses and attributes (a new feature).

In addition to the changes listed here from 0.4.3 to 0.5.0, the minor releases 4.1,0.4.2, and 0.4.3 brought some significant new functionality and performance improvements that are worth taking a look at.

Thanks to all for bug reports, contributed patches and generally providing feedback on the library.

API Changes¶

read_table,read_csv, andExcelFile.parse default arguments forindex_col is now None. To use one or more of the columns as the resultingDataFrame’s index, these must be explicitly specified now
Parsing functions likeread_csv no longer parse dates by default (GHGH225)
Removedweights option in panel regression which was not doing anythingprincipled (GH155)
Changedbuffer argument name inSeries.to_string tobuf
Series.to_string andDataFrame.to_string now return strings by defaultinstead of printing to sys.stdout
DeprecatednanRep argument in variousto_string andto_csv functionsin favor ofna_rep. Will be removed in 0.6 (GH275)
Renameddelimiter tosep inDataFrame.from_csv for consistency
Changed order ofSeries.clip arguments to match those ofnumpy.clip andadded (unimplemented)out argument sonumpy.clip can be called on aSeries (GH272)
Series functions renamed (and thus deprecated) in 0.4 series have beenremoved:
- asOf, useasof
- toDict, useto_dict
- toString, useto_string
- toCSV, useto_csv
- merge, usemap
- applymap, useapply
- combineFirst, usecombine_first
- _firstTimeWithValue usefirst_valid_index
- _lastTimeWithValue uselast_valid_index
DataFrame functions renamed / deprecated in 0.4 series have been removed:
- asMatrix method, useas_matrix orvalues attribute
- combineFirst, usecombine_first
- getXS, usexs
- merge, usejoin
- fromRecords, usefrom_records
- fromcsv, usefrom_csv
- toRecords, useto_records
- toDict, useto_dict
- toString, useto_string
- toCSV, useto_csv
- _firstTimeWithValue usefirst_valid_index
- _lastTimeWithValue uselast_valid_index
- toDataMatrix is no longer needed
- rows() method, useindex attribute
- cols() method, usecolumns attribute
- dropEmptyRows(), usedropna(how=’all’)
- dropIncompleteRows(), usedropna()
- tapply(f), useapply(f, axis=1)
- tgroupby(keyfunc, aggfunc), usegroupby withaxis=1

Deprecations Removed¶

indexField argument inDataFrame.from_records
missingAtEnd argument inSeries.order. Usena_last instead
Series.fromValue classmethod, use regularSeries constructor instead
FunctionsparseCSV,parseText, andparseExcel methods inpandas.io.parsers have been removed
Index.asOfDate function
Panel.getMinorXS (useminor_xs) andPanel.getMajorXS (usemajor_xs)
Panel.toWide, usePanel.to_wide instead

New Features¶

AddedDataFrame.align method with standard join options
Addedparse_dates option toread_csv andread_table methods tooptionally try to parse dates in the index columns
Addnrows,chunksize, anditerator arguments toread_csv andread_table. The last two return a newTextParser class capable oflazily iterating through chunks of a flat file (GH242)
Added ability to join on multiple columns inDataFrame.join (GH214)
Added private_get_duplicates function toIndex for identifyingduplicate values more easily
Added column attribute access to DataFrame, e.g. df.A equivalent to df[‘A’]if ‘A’ is a column in the DataFrame (GH213)
Added IPython tab completion hook for DataFrame columns. (GH233,GH230)
ImplementSeries.describe for Series containing objects (GH241)
Add inner join option toDataFrame.join when joining on key(s) (GH248)
Can select set of DataFrame columns by passing a list to__getitem__ (GHGH253)
Can use & and | to intersection / union Index objects, respectively (GHGH261)
Addedpivot_table convenience function to pandas namespace (GH234)
ImplementedPanel.rename_axis function (GH243)
DataFrame will show index level names in console output
ImplementedPanel.take
Addset_eng_float_format function for setting alternate DataFramefloating point string formatting
Add convenienceset_index function for creating a DataFrame index fromits existing columns

Improvements to existing features¶

Major performance improvements in file parsing functionsread_csv andread_table
Added Cython function for converting tuples to ndarray very fast. Speeds upmany MultiIndex-related operations
File parsing functions likeread_csv andread_table will explicitlycheck if a parsed index has duplicates and raise a more helpful exceptionrather than deferring the check until later
Refactored merging / joining code into a tidy class and disabled unnecessarycomputations in the float/object case, thus getting about 10% betterperformance (GH211)
Improved speed ofDataFrame.xs on mixed-type DataFrame objects by about5x, regression from 0.3.0 (GH215)
With newDataFrame.align method, speeding up binary operations betweendifferently-indexed DataFrame objects by 10-25%.
Significantly sped up conversion of nested dict into DataFrame (GH212)
Can pass hierarchical index level name togroupby instead of the levelnumber if desired (GH223)
Add support for different delimiters inDataFrame.to_csv (GH244)
Add more helpful error message when importing pandas post-installation fromthe source directory (GH250)
Significantly speed up DataFrame__repr__ andcount on large mixed-typeDataFrame objects
Better handling of pyx file dependencies in Cython module build (GH271)

Bug Fixes¶

read_csv /read_table fixes
- Be less aggressive about converting float->int in cases of floating pointrepresentations of integers like 1.0, 2.0, etc.
- “True”/”False” will not get correctly converted to boolean
- Index name attribute will get set when specifying an index column
- Passing column names should forceheader=None (GH257)
- Don’t modify passed column names whenindex_col is not None(GH258)
- Can sniff CSV separator in zip file (since seek is not supported, wasfailing before)
Worked around matplotlib “bug” in which series[:, np.newaxis] fails. Shouldbe reported upstream to matplotlib (GH224)
DataFrame.iteritems was not returning Series with the name attributeset. Also neither was DataFrame._series
Can store datetime.date objects in HDFStore (GH231)
Index and Series names are now stored in HDFStore
Fixed problem in which data would get upcasted to object dtype inGroupBy.apply operations (GH237)
Fixed outer join bug with empty DataFrame (GH238)
Can create empty Panel (GH239)
Fix join on single key when passing list with 1 entry (GH246)
Don’t raise Exception on plotting DataFrame with an all-NA column (GH251,GH254)
Bug min/max errors when called on integer DataFrames (GH241)
DataFrame.iteritems andDataFrame._series not assigning name attribute
Panel.__repr__ raised exception on length-0 major/minor axes
DataFrame.join on key with empty DataFrame produced incorrect columns
ImplementedMultiIndex.diff (GH260)
Int64Index.take andMultiIndex.take lost name field, fix downstreamissueGH262
Can pass list of tuples toSeries (GH270)
Can pass level name toDataFrame.stack
Support set operations between MultiIndex and Index
Fix many corner cases in MultiIndex set operations- Fix MultiIndex-handling bug with GroupBy.apply when returned groups are notindexed the same
Fix corner case bugs in DataFrame.apply
Setting DataFrame index did not cause Series cache to get cleared
Various int32 -> int64 platform-specific issues
Don’t be too aggressive converting to integer when parsing file withMultiIndex (GH285)
Fix bug when slicing Series with negative indices before beginning

Thanks¶

Thomas Kluyver
Daniel Fortunov
Aman Thakral
Luca Beltrame
Wouter Overmeire

pandas 0.4.3¶

Release date: 10/9/2011

is is largely a bugfix release from 0.4.2 but also includes a handful of newd enhanced features. Also, pandas can now be installed and used on Python 3hanks Thomas Kluyver!).

New Features¶

Python 3 support using 2to3 (GH200, Thomas Kluyver)
Addname attribute toSeries and added relevant logic and tests. Namenow prints as part ofSeries.__repr__
Addname attribute to standard Index so that stacking / unstacking doesnot discard names and so that indexed DataFrame objects can be reliablyround-tripped to flat files, pickle, HDF5, etc.
Addisnull andnotnull as instance methods on Series (GH209,GH203)

Improvements to existing features¶

Skip xlrd-related unit tests if not installed
Index.append andMultiIndex.append can accept a list of Index objects toconcatenate together
Altered binary operations on differently-indexed SparseSeries objects to usethe integer-based (dense) alignment logic which is faster with a largernumber of blocks (GH205)
RefactoredSeries.__repr__ to be a bit more clean and consistent

API Changes¶

Series.describe andDataFrame.describe now bring the 25% and 75%quartiles instead of the 10% and 90% deciles. The other outputs have notchanged
Series.toString will print deprecation warning, has been de-camelCased toto_string

Bug Fixes¶

Fix broken interaction betweenIndex andInt64Index when callingintersection. ImplementInt64Index.intersection
MultiIndex.sortlevel discarded the level names (GH202)
Fix bugs in groupby, join, and append due to improper concatenation ofMultiIndex objects (GH201)
Fix regression from 0.4.1,isnull andnotnull ceased to work on otherkinds of Python scalar objects likedatetime.datetime
Raise more helpful exception when attempting to write empty DataFrame orLongPanel toHDFStore (GH204)
Use stdlib csv module to properly escape strings with commas inDataFrame.to_csv (GH206, Thomas Kluyver)
Fix Python ndarray access in Cython code for sparse blocked index integritycheck
Fix bug writing Series to CSV in Python 3 (GH209)
Miscellaneous Python 3 bugfixes

Thanks¶

Thomas Kluyver
rsamson

pandas 0.4.2¶

Release date: 10/3/2011

is is a performance optimization release with several bug fixes. The newt64Index and new merging / joining Cython code and related Pythonfrastructure are the main new additions

New Features¶

Added fastInt64Index type with specialized join, union,intersection. Will result in significant performance enhancements forint64-based time series (e.g. using NumPy’s datetime64 one day) and alsofaster operations on DataFrame objects storing record array-like data.
RefactoredIndex classes to have ajoin method and associated dataalignment routines throughout the codebase to be able to leverage optimizedjoining / merging routines.
AddedSeries.align method for aligning two series with choice of joinmethod
Wrote faster Cython data alignment / merging routines resulting insubstantial speed increases
Addedis_monotonic property toIndex classes with associated Cythoncode to evaluate the monotonicity of theIndex values
Add methodget_level_values toMultiIndex
Implemented shallow copy ofBlockManager object inDataFrame internals

Improvements to existing features¶

Improved performance ofisnull andnotnull, a regression from v0.3.0(GH187)
Wrote templating / code generation script to auto-generate Cython code forvarious functions which need to be available for the 4 major data typesused in pandas (float64, bool, object, int64)
Refactored code related toDataFrame.join so that intermediate alignedcopies of the data in eachDataFrame argument do not need to becreated. Substantial performance increases result (GH176)
Substantially improved performance of genericIndex.intersection andIndex.union
Improved performance ofDateRange.union with overlapping ranges andnon-cacheable offsets (like Minute). Implemented analogous fastDateRange.intersection for overlapping ranges.
ImplementedBlockManager.take resulting in significantly fastertakeperformance on mixed-typeDataFrame objects (GH104)
Improved performance ofSeries.sort_index
Significant groupby performance enhancement: removed unnecessary integritychecks in DataFrame internals that were slowing down slicing operations toretrieve groups
Added informative Exception when passing dict to DataFrame groupbyaggregation with axis != 0

API Changes¶

Bug Fixes¶

Fixed minor unhandled exception in Cython code implementing fast groupbyaggregation operations
Fixed bug in unstacking code manifesting with more than 3 hierarchicallevels
Throw exception when step specified in label-based slice (GH185)
Fix isnull to correctly work with np.float32. Fix upstream bug described inGH182
Finish implementation of as_index=False in groupby for DataFrameaggregation (GH181)
Raise SkipTest for pre-epoch HDFStore failure. Real fix will be sorted outvia datetime64 dtype

Thanks¶

Uri Laserson
Scott Sinclair

pandas 0.4.1¶

Release date: 9/25/2011

is is primarily a bug fix release but includes some new features andimprovements

New Features¶

Added newDataFrame methodsget_dtype_counts and propertydtypes
Setting of values using.ix indexing attribute in mixed-type DataFrameobjects has been implemented (fixesGH135)
read_csv can read multiple columns into aMultiIndex. DataFrame’sto_csv method will properly write out aMultiIndex which can be readback (GH151, thanks to Skipper Seabold)
Wrote fast time series merging / joining methods in Cython. Will beintegrated later into DataFrame.join and related functions
Addedignore_index option toDataFrame.append for combining unindexedrecords stored in a DataFrame

Improvements to existing features¶

Some speed enhancements with internal Index type-checking function
DataFrame.rename has a newcopy parameter which can rename a DataFramein place
Enable unstacking by level name (GH142)
Enable sortlevel to work by level name (GH141)
read_csv can automatically “sniff” other kinds of delimiters usingcsv.Sniffer (GH146)
Improved speed of unit test suite by about 40%
Exception will not be raised callingHDFStore.remove on non-existent nodewith where clause
Optimized_ensure_index function resulting in performance savings intype-checking Index objects

API Changes¶

Bug Fixes¶

Fixed DataFrame constructor bug causing downstream problems (e.g. .copy()failing) when passing a Series as the values along with a column name andindex
Fixed single-key groupby on DataFrame with as_index=False (GH160)
Series.shift was failing on integer Series (GH154)
unstack methods were producing incorrect output in the case of duplicatehierarchical labels. An exception will now be raised (GH147)
Callingcount with level argument caused reduceat failure or segfault inearlier NumPy (GH169)
FixedDataFrame.corrwith to automatically exclude non-numeric data (GHGH144)
Unicode handling bug fixes inDataFrame.to_string (GH138)
Excluding OLS degenerate unit test case that was causing platform specificfailure (GH149)
Skip blosc-dependent unit tests for PyTables < 2.2 (GH137)
Callingcopy onDateRange did not copy over attributes to the new object(GH168)
Fix bug inHDFStore in which Panel data could be appended to a Table withdifferent item order, thus resulting in an incorrect result read back

Thanks¶

Yaroslav Halchenko
Jeff Reback
Skipper Seabold
Dan Lovell
Nick Pentreath

pandas 0.4.0¶

Release date: 9/12/2011

New Features¶

pandas.core.sparse module: “Sparse” (mostly-NA, or some other fill value)versions ofSeries,DataFrame, andPanel. For low-density data, thiswill result in significant performance boosts, and smaller memoryfootprint. Addedto_sparse methods toSeries,DataFrame, andPanel. See online documentation for more on these
Fancy indexing operator on Series / DataFrame, e.g. via .ix operator. Bothgetting and setting of values is supported; however, setting values will onlycurrently work on homogeneously-typed DataFrame objects. Things like:
- series.ix[[d1, d2, d3]]
- frame.ix[5:10, [‘C’, ‘B’, ‘A’]], frame.ix[5:10, ‘A’:’C’]
- frame.ix[date1:date2]
Significantly enhancedgroupby functionality
- Can groupby multiple keys, e.g. df.groupby([‘key1’, ‘key2’]). Iteration withmultiple groupings products a flattened tuple
- “Nuisance” columns (non-aggregatable) will automatically be excluded fromDataFrame aggregation operations
- Added automatic “dispatching to Series / DataFrame methods to more easilyinvoke methods on groups. e.g. s.groupby(crit).std() will work even thoughstd is not implemented on theGroupBy class
Hierarchical / multi-level indexing
- New theMultiIndex class. IntegratedMultiIndex intoSeries andDataFrame fancy indexing, slicing, __getitem__ and __setitem,reindexing, etc. Addedlevel keyword argument togroupby to enablegrouping by a level of aMultiIndex
New data reshaping functions:stack andunstack on DataFrame and Series
- Integrate with MultiIndex to enable sophisticated reshaping of data
Index objects (labels for axes) are now capable of holding tuples
Series.describe,DataFrame.describe: produces an R-like table of summarystatistics about each data column
DataFrame.quantile,Series.quantile for computing sample quantiles of dataacross requested axis
Added generalDataFrame.dropna method to replacedropIncompleteRows anddropEmptyRows, deprecated those.
Series arithmetic methods with optional fill_value for missing data,e.g. a.add(b, fill_value=0). If a location is missing for both it will stillbe missing in the result though.
fill_value option has been added toDataFrame.{add, mul, sub, div} methodssimilar toSeries
Boolean indexing withDataFrame objects: data[data > 0.1] = 0.1 ordata[data> other] = 1.
pytz / tzinfo support inDateRange
- tz_localize,tz_normalize, andtz_validate methods added
AddedExcelFile class topandas.io.parsers for parsing multiple sheets outof a single Excel 2003 document
GroupBy aggregations can now optionallybroadcast, e.g. produce an objectof the same size with the aggregated value propagated
Addedselect function in all data structures: reindex axis based onarbitrary criterion (function returning boolean value),e.g. frame.select(lambda x: ‘foo’ in x, axis=1)
DataFrame.consolidate method, API function relating to redesigned internals
DataFrame.insert method for inserting column at a specified location ratherthan the default __setitem__ behavior (which puts it at the end)
HDFStore class inpandas.io.pytables has been largely rewritten usingpatches from Jeff Reback from others. It now supports mixed-typeDataFrameandSeries data and can storePanel objects. It also has the option toqueryDataFrame andPanel data. Loading data from legacyHDFStorefiles is supported explicitly in the code
Addedset_printoptions method to modify appearance of DataFrame tabularoutput
rolling_quantile functions; a moving version ofSeries.quantile /DataFrame.quantile
Genericrolling_apply moving window function
Newdrop method added toSeries,DataFrame, etc. which can drop a set oflabels from an axis, producing a new object
reindex methods now sport acopy option so that data is not forced to becopied then the resulting object is indexed the same
Addedsort_index methods to Series and Panel. RenamedDataFrame.sorttosort_index. LeavingDataFrame.sort for now.
Addedskipna option to statistical instance methods on all the datastructures
pandas.io.data module providing a consistent interface for reading timeseries data from several different sources

Improvements to existing features¶

The 2-dimensionalDataFrame andDataMatrix classes have been extensivelyredesigned internally into a single classDataFrame, preserving wherepossible their optimal performance characteristics. This should reduceconfusion from users about which class to use.
- Note that under the hood there is a new essentially “lazy evaluation”scheme within respect to adding columns to DataFrame. During someoperations, like-typed blocks will be “consolidated” but not before.
DataFrame accessing columns repeatedly is now significantly faster thanDataMatrix used to be in 0.3.0 due to an internal Series caching mechanism(which are all views on the underlying data)
Column ordering for mixed type data is now completely consistent inDataFrame. In prior releases, there was inconsistent column ordering inDataMatrix
Improved console / string formatting of DataMatrix with negative numbers
Improved tabular data parsing functions,read_table andread_csv:
- Addedskiprows andna_values arguments topandas.io.parsers functionsfor more flexible IO
- parseCSV /read_csv functions and others inpandas.io.parsers now cantake a list of custom NA values, and also a list of rows to skip
Can sliceDataFrame and get a view of the data (when homogeneously typed),e.g. frame.xs(idx, copy=False) or frame.ix[idx]
Many speed optimizations throughoutSeries andDataFrame
Eager evaluation of groups when callinggroupby functions, so if there isan exception with the grouping function it will raised immediately versussometime later on when the groups are needed
datetools.WeekOfMonth offset can be parameterized withn different than 1or -1.
Statistical methods on DataFrame likemean,std,var,skew will nowignore non-numerical data. Before a not very useful error message wasgenerated. A flagnumeric_only has been added toDataFrame.sum andDataFrame.count to enable this behavior in those methods if so desired(disabled by default)
DataFrame.pivot generalized to enable pivoting multiple columns into aDataFrame with hierarchical columns
DataFrame constructor can accept structured / record arrays
Panel constructor can accept a dict of DataFrame-like objects. Do notneed to usefrom_dict anymore (from_dict is there to stay, though).

API Changes¶

TheDataMatrix variable now refers toDataFrame, will be removed withintwo releases
WidePanel is now known asPanel. TheWidePanel variable in the pandasnamespace now refers to the renamedPanel class
LongPanel andPanel /WidePanel now no longer have a commonsubclass.LongPanel is now a subclass ofDataFrame having a number ofadditional methods and a hierarchical index instead of the oldLongPanelIndex object, which has been removed. LegacyLongPanel picklesmay not load properly
Cython is now required to buildpandas from a development branch. This wasdone to avoid continuing to check in cythonized C files into sourcecontrol. Builds from released source distributions will not require Cython
Cython code has been moved up to a top levelpandas/src directory. Cythonextension modules have been renamed and promoted from thelib subpackage tothe top level, i.e.
- pandas.lib.tseries ->pandas._tseries
- pandas.lib.sparse ->pandas._sparse
DataFrame pickling format has changed. Backwards compatibility for legacypickles is provided, but it’s recommended to consider PyTables-basedHDFStore for storing data with a longer expected shelf life
Acopy argument has been added to theDataFrame constructor to avoidunnecessary copying of data. Data is no longer copied by default when passedinto the constructor
Handling of boolean dtype inDataFrame has been improved to support storageof boolean data with NA / NaN values. Before it was being converted to float64so this should not (in theory) cause API breakage
To optimize performance, Index objects now only check that their labels areunique when uniqueness matters (i.e. when someone goes to perform alookup). This is a potentially dangerous tradeoff, but will lead to muchbetter performance in many places (like groupby).
Boolean indexing using Series must now have the same indices (labels)
Backwards compatibility support for begin/end/nPeriods keyword arguments inDateRange class has been removed
More intuitive / shorter filling aliasesffill (forpad) andbfill (forbackfill) have been added to the functions that use them:reindex,asfreq,fillna.
pandas.core.mixins code moved topandas.core.generic
buffer keyword arguments (e.g.DataFrame.toString) renamed tobuf toavoid using Python built-in name
DataFrame.rows() removed (useDataFrame.index)
Added deprecation warning toDataFrame.cols(), to be removed in next release
DataFrame deprecations and de-camelCasing:merge,asMatrix,toDataMatrix,_firstTimeWithValue,_lastTimeWithValue,toRecords,fromRecords,tgroupby,toString
pandas.io.parsers method deprecations
- parseCSV is nowread_csv and keyword arguments have been de-camelCased
- parseText is nowread_table
- parseExcel is replaced by theExcelFile class and itsparse method
fillMethod arguments (deprecated in prior release) removed, should bereplaced withmethod
Series.fill,DataFrame.fill, andPanel.fill removed, usefillnainstead
groupby functions now exclude NA / NaN values from the list of groups. Thismatches R behavior with NAs in factors e.g. with thetapply function
RemovedparseText,parseCSV andparseExcel from pandas namespace
Series.combineFunc renamed toSeries.combine and made a bit more generalwith afill_value keyword argument defaulting to NaN
Removedpandas.core.pytools module. Code has been moved topandas.core.common
Tacked ongroupName attribute for groups in GroupBy renamed toname
Panel/LongPaneldims attribute renamed toshape to be more conformant
Slicing aSeries returns a view now
More Series deprecations / renaming:toCSV toto_csv,asOf toasof,merge tomap,applymap toapply,toDict toto_dict,combineFirst tocombine_first. Will printFutureWarning.
DataFrame.to_csv does not write an “index” column label by defaultanymore since the output file can be read back without it. However, thereis a newindex_label argument. So you can doindex_label='index' toemulate the old behavior
datetools.Week argument renamed fromdayOfWeek toweekday
timeRule argument inshift has been deprecated in favor of using theoffset argument for everything. So you can still pass a time rule stringtooffset
Added optionalencoding argument toread_csv,read_table,to_csv,from_csv to handle unicode in python 2.x

Bug Fixes¶

Column ordering inpandas.io.parsers.parseCSV will match CSV in the presenceof mixed-type data
Fixed handling of Excel 2003 dates inpandas.io.parsers
DateRange caching was happening with high resolutionDateOffset objects,e.g.DateOffset(seconds=1). This has been fixed
Fixed __truediv__ issue inDataFrame
FixedDataFrame.toCSV bug preventing IO round trips in some cases
Fixed bug inSeries.plot causing matplotlib to barf in exceptional cases
DisabledIndex objects from being hashable, like ndarrays
Added__ne__ implementation toIndex so that operations like ts[ts != idx]will work
Added__ne__ implementation toDataFrame
Bug / unintuitive result when callingfillna on unordered labels
Bug callingsum on boolean DataFrame
Bug fix when creating a DataFrame from a dict with scalar values
Series.{sum, mean, std, ...} now return NA/NaN when the whole Series is NA
NumPy 1.4 through 1.6 compatibility fixes
Fixed bug in bias correction inrolling_cov, was affectingrolling_corrtoo
R-square value was incorrect in the presence of fixed and time effects inthePanelOLS classes
HDFStore can handle duplicates in table format, will take

Thanks¶

Joon Ro
Michael Pennington
Chris Uga
Chris Withers
Jeff Reback
Ted Square
Craig Austin
William Ferreira
Daniel Fortunov
Tony Roberts
Martin Felder
John Marino
Tim McNamara
Justin Berka
Dieter Vandenbussche
Shane Conway
Skipper Seabold
Chris Jordan-Squire

pandas 0.3.0¶

Release date: February 20, 2011

New features¶

corrwith function to compute column- or row-wise correlations between twoDataFrame objects
Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
Added comparison magic methods (__lt__, __gt__, etc.)
Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
Addedreindex_like method
Addedreindex_like method to WidePanel
Convenience functions for accessing SQL-like databases inpandas.io.sqlmodule
Added (still experimental) HDFStore class for storing pandas datastructures using HDF5 / PyTables inpandas.io.pytables module
Added WeekOfMonth date offset
pandas.rpy (experimental) module created, provide some interfacing /conversion between rpy2 and pandas

Improvements to existing features¶

Unit test coverage: 100% line coverage of core data structures
Speed enhancement to rolling_{median, max, min}
Column ordering between DataFrame and DataMatrix is now consistent: beforeDataFrame would not respect column order
Improved {Series, DataFrame}.plot methods to be more flexible (can passmatplotlib Axis arguments, plot DataFrame columns in multiple subplots,etc.)

API Changes¶

Exponentially-weighted moment functions inpandas.stats.moments have amore consistent API and accept a min_periods argument like their regularmoving counterparts.
fillMethod argument in Series, DataFrame changed tomethod,FutureWarning added.
fill method in Series, DataFrame/DataMatrix, WidePanel renamed tofillna,FutureWarning added tofill
RenamedDataFrame.getXS toxs,FutureWarning added
Removedcap andfloor functions from DataFrame, renamed toclip_upper andclip_lower for consistency with NumPy

Bug Fixes¶

Fixed bug in IndexableSkiplist Cython code that was breaking rolling_maxfunction
Numerous numpy.int64-related indexing fixes
Several NumPy 1.4.0 NaN-handling fixes
Bug fixes to pandas.io.parsers.parseCSV
FixedDateRange caching issue with unusual date offsets
Fixed bug inDateRange.union
Fixed corner case inIndexableSkiplist implementation

Movatterモバイル変換

Table Of Contents

Search

Release Notes¶

pandas 0.19.1¶

Thanks¶

pandas 0.19.0¶

Thanks¶

pandas 0.18.1¶

Thanks¶

pandas 0.18.0¶

Thanks¶

pandas 0.17.1¶

Thanks¶

pandas 0.17.0¶

Thanks¶

pandas 0.16.2¶

Thanks¶

pandas 0.16.1¶

Thanks¶

pandas 0.16.0¶

Thanks¶

pandas 0.15.2¶

Thanks¶

pandas 0.15.1¶

Thanks¶

pandas 0.15.0¶

Thanks¶

pandas 0.14.1¶

Thanks¶

pandas 0.14.0¶

Thanks¶

pandas 0.13.1¶

New Features¶

API Changes¶

Experimental Features¶

Improvements to existing features¶

Bug Fixes¶

pandas 0.13.0¶

New Features¶

Experimental Features¶

Improvements to existing features¶

API Changes¶

Internal Refactoring¶

Bug Fixes¶

pandas 0.12.0¶

New Features¶

Improvements to existing features¶

API Changes¶

Experimental Features¶

Bug Fixes¶

pandas 0.11.0¶

New Features¶

Improvements to existing features¶

API Changes¶

Bug Fixes¶

pandas 0.10.1¶

New Features¶

API Changes¶

Improvements to existing features¶

Bug Fixes¶

pandas 0.10.0¶

New Features¶

Experimental Features¶

API Changes¶

Improvements to existing features¶

Bug Fixes¶

pandas 0.9.1¶

New Features¶

API Changes¶

Improvements to existing features¶

Bug Fixes¶

pandas 0.9.0¶

New Features¶

Improvements to existing features¶

API Changes¶

Bug Fixes¶

pandas 0.8.1¶

New Features¶

Improvements to existing features¶