- Notifications
You must be signed in to change notification settings - Fork9
Streaming API for pandas applied to big datasets
License
NotificationsYou must be signed in to change notification settings
sdpython/pandas-streaming
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation

pandas-streamingaims at processing big files withpandas,too big to hold in memory, too small to be parallelized with a significant gain.The module replicates a subset ofpandas APIand implements other functionalities for machine learning.
frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_csv("filename",sep="\t",encoding="utf-8")fordfinsdf:# process this chunk of data# df is a dataframeprint(df)
The module can also stream an existing dataframe.
importpandasdf=pandas.DataFrame([dict(cf=0,cint=0,cstr="0"),dict(cf=1,cint=1,cstr="1"),dict(cf=3,cint=3,cstr="3")])frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_df(df)fordfinsdf:# process this chunk of data# df is a dataframeprint(df)
It contains other helpers to split datasets intotrain and test with some weird constraints.
About
Streaming API for pandas applied to big datasets
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.