Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Python CSV, and delimiter-spaced files, for humans!

License

NotificationsYou must be signed in to change notification settings

jlumbroso/comma

Repository files navigation

pytestcodecovDocumentation StatusDownloadsRun on Repl.itStargazers

This library tries to make manipulating CSV files a great experience.

Example session

Features

Here are some of the features thatcomma supports:

  • Robust autodetection of CSV parameters(thanks toclevercsv) andencoding (thanks tochardet).
  • Single-line usage,comma.load(...), no syntax to remember or parameters to tweak.
  • Simple, Pythonic interface to access/modify the rows using standardlist anddict operations, i.e.row[0] androw["street"] are equivalent.
  • Column slices using the header name, i.e.table["street"].
  • In-place editing of the dataset, including multiple lines.
  • Opening files directly from an URL.

Installation

If you use pip:

pip install'comma[autodetect,net]'

or if you use pipenv:

pipenv install'comma[autodetect,net]'

Why?

Although Python, fortuitously, is"batteries included",on occasion, some of the libraries end up being designed with APIsthat don't map well to what turns out to be the most common usagepatterns. This is what happened with the variousurllib libraries,incredibly powerful, but limiting users by its complexity---it wasnot straightforward, for instance, to use cookies: One of severalproblems thatrequests by@ken-reitz addressed. Indeed,requests abstracts power beneath simplicity, smart defaults, anddiscoverability.

For the CSV format, we are confronted with a similar situation. Whileboth the JSON and YAML formats have packages that provide, one-commandmeans to load content from files in those respective formats to anested Python object, for the CSV format,the standard library hasyou use an iterator to access the data. Many details require significantsyntax change (for instance the difference between having lists ordictionaries depends on the class that is used to read the file).

Since then, we also have several excellent libraries that, by providinggreat auto-detection (of dialect, file format, encoding, etc.) allowfor hiding many details from the end user.

All this to say,comma will try to do exactly what you wantwhen you do:

importcommadata=comma.load("file.csv")data[0]["field"]="changed value"comma.dump(data,filename="file_modified.csv")

Alternatives

Python is fortunate to have a lot of very good libraries to read/writeCSV and tabular files in general. (Some of these were discovered throughthe excellentAwesome Python list.)

  • clevercsv: Anexceptional library by@GjjvdBurg,builds on statistical and empiricalto provide powerful and reliable CSV dialect detection. However, itstrives to be a drop-in replacement for the original Pythoncsvmodule, and as such does not improve on the complex syntax. Thislibrary isthe culmination of serious peer-reviewedresearch, andcomma uses itinternally to improve auto-detection.

  • csvkit: This is a set ofcommand-line tools (rather than a module/package) written in Python,to make it easier to manipulate CSV files. One of the highlights isa tool calledcsvpy <file.csv> to open a Python shell with the CSVdata loaded into a Python object calledreader, to quickly runsome Python logic on the data. While it is technically possible tousecsvkit's internals in a project, this is not documented.

  • pandas: An advanced datascience package for Python, this certainly provides a powerful CSV(and more generally, table file) reader and parser. The API of thetable object is very powerful, but you need to take the time to learnhow to use it. This library is perhaps not ideal for file manipulations.

  • pyexcel: This library providesaccess to Excel and other tabular formats, including CSV, and variousdata sources (stream, database, file, ...). It emphasizes one commonformat-agnostic API, that instead has the user choose the data format(list, matrix, dictionary, ...).

  • tablib: This library wasoriginally written byKenneth Reitz,the creator who broughtrequests,pipenv and many other goodies toPython---and then included in theJazzbandcollective. The focus of this library is on interoperating between manydifferent file formats (such as XLS, CSV, JSON, YAML, DF, etc., ...,even LaTeXbooktabs!). It seems to have a very high adoption ratebecause it is a dependency for many Jazzband libraries. The API isclass-based rather than method-based. A companion library,prettytable focuses onpretty printing tabular data (including from a CSV file).

  • tabulator: Thislibrary provides a single interface to manipulate extremely largetabular data---and useful for files so large that they need to bestreamed line-by-line; the library supports a broad array of formatsincluding reading data directly from Google Spreadsheets. Howeverthis power means that reading a CSV file requires several operations.

Although not specifically restricted to Python, theAwesomeCSV resource is alsointeresting.

Miscellaneous

Although not specifically a Python library, nor designed to read/write CSVfiles (but instead to compare them),daffis a really cool project: It provides adiff of tabular datawith cell-levelawareness.

Another unrelated project isGrist, a spreadsheetPaaS, which among other useful features, allowsthe use of Python withinformulas.

Acknowledgements

Thanks to@zbanks for the name of the package!Thanks to@rfreling,@adamfinkelstein for discussing ideasbefore I got started on this. Thanks to@GjjvdBurgand collaborators for awesome, awesome contribution to text processing scienceand our Python community withclevercsv.

License

This project is licensed under the LGPLv3 license, with the understandingthat importing a Python modular is similar in spirit to dynamically linkingagainst it.

  • You can use the librarycomma in any project, for any purpose, as longas you provide some acknowledgement to this original project for use ofthe library.

  • If you make improvements tocomma, you are required to make thosechanges publicly available.

About

Python CSV, and delimiter-spaced files, for humans!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp