Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Streaming API for pandas applied to big datasets

License

NotificationsYou must be signed in to change notification settings

sdpython/pandas-streaming

Repository files navigation

Build Status Windowshttps://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streamingMIT Licensehttps://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8GitHub IssuesDownloadsForksStarssize

pandas-streamingaims at processing big files withpandas,too big to hold in memory, too small to be parallelized with a significant gain.The module replicates a subset ofpandas APIand implements other functionalities for machine learning.

frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_csv("filename",sep="\t",encoding="utf-8")fordfinsdf:# process this chunk of data# df is a dataframeprint(df)

The module can also stream an existing dataframe.

importpandasdf=pandas.DataFrame([dict(cf=0,cint=0,cstr="0"),dict(cf=1,cint=1,cstr="1"),dict(cf=3,cint=3,cstr="3")])frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_df(df)fordfinsdf:# process this chunk of data# df is a dataframeprint(df)

It contains other helpers to split datasets intotrain and test with some weird constraints.

About

Streaming API for pandas applied to big datasets

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp