- Notifications
You must be signed in to change notification settings - Fork3
Python CSV, and delimiter-spaced files, for humans!
License
jlumbroso/comma
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This library tries to make manipulating CSV files a great experience.
Here are some of the features thatcomma
supports:
- Robust autodetection of CSV parameters(thanks to
clevercsv
) andencoding (thanks tochardet
). - Single-line usage,
comma.load(...)
, no syntax to remember or parameters to tweak. - Simple, Pythonic interface to access/modify the rows using standard
list
anddict
operations, i.e.row[0]
androw["street"]
are equivalent. - Column slices using the header name, i.e.
table["street"]
. - In-place editing of the dataset, including multiple lines.
- Opening files directly from an URL.
If you use pip:
pip install'comma[autodetect,net]'
or if you use pipenv:
pipenv install'comma[autodetect,net]'
Although Python, fortuitously, is"batteries included",on occasion, some of the libraries end up being designed with APIsthat don't map well to what turns out to be the most common usagepatterns. This is what happened with the variousurllib
libraries,incredibly powerful, but limiting users by its complexity---it wasnot straightforward, for instance, to use cookies: One of severalproblems thatrequests
by@ken-reitz addressed. Indeed,requests
abstracts power beneath simplicity, smart defaults, anddiscoverability.
For the CSV format, we are confronted with a similar situation. Whileboth the JSON and YAML formats have packages that provide, one-commandmeans to load content from files in those respective formats to anested Python object, for the CSV format,the standard library hasyou use an iterator to access the data. Many details require significantsyntax change (for instance the difference between having lists ordictionaries depends on the class that is used to read the file).
Since then, we also have several excellent libraries that, by providinggreat auto-detection (of dialect, file format, encoding, etc.) allowfor hiding many details from the end user.
All this to say,comma
will try to do exactly what you wantwhen you do:
importcommadata=comma.load("file.csv")data[0]["field"]="changed value"comma.dump(data,filename="file_modified.csv")
Python is fortunate to have a lot of very good libraries to read/writeCSV and tabular files in general. (Some of these were discovered throughthe excellentAwesome Python list.)
clevercsv
: Anexceptional library by@GjjvdBurg,builds on statistical and empiricalto provide powerful and reliable CSV dialect detection. However, itstrives to be a drop-in replacement for the original Pythoncsv
module, and as such does not improve on the complex syntax. Thislibrary isthe culmination of serious peer-reviewedresearch, andcomma
uses itinternally to improve auto-detection.csvkit
: This is a set ofcommand-line tools (rather than a module/package) written in Python,to make it easier to manipulate CSV files. One of the highlights isa tool calledcsvpy <file.csv>
to open a Python shell with the CSVdata loaded into a Python object calledreader
, to quickly runsome Python logic on the data. While it is technically possible tousecsvkit
's internals in a project, this is not documented.pandas
: An advanced datascience package for Python, this certainly provides a powerful CSV(and more generally, table file) reader and parser. The API of thetable object is very powerful, but you need to take the time to learnhow to use it. This library is perhaps not ideal for file manipulations.pyexcel
: This library providesaccess to Excel and other tabular formats, including CSV, and variousdata sources (stream, database, file, ...). It emphasizes one commonformat-agnostic API, that instead has the user choose the data format(list, matrix, dictionary, ...).tablib
: This library wasoriginally written byKenneth Reitz,the creator who broughtrequests
,pipenv
and many other goodies toPython---and then included in theJazzbandcollective. The focus of this library is on interoperating between manydifferent file formats (such as XLS, CSV, JSON, YAML, DF, etc., ...,even LaTeXbooktabs
!). It seems to have a very high adoption ratebecause it is a dependency for many Jazzband libraries. The API isclass-based rather than method-based. A companion library,prettytable
focuses onpretty printing tabular data (including from a CSV file).tabulator
: Thislibrary provides a single interface to manipulate extremely largetabular data---and useful for files so large that they need to bestreamed line-by-line; the library supports a broad array of formatsincluding reading data directly from Google Spreadsheets. Howeverthis power means that reading a CSV file requires several operations.
Although not specifically restricted to Python, theAwesomeCSV resource is alsointeresting.
Although not specifically a Python library, nor designed to read/write CSVfiles (but instead to compare them),daff
is a really cool project: It provides adiff
of tabular datawith cell-levelawareness.
Another unrelated project isGrist, a spreadsheetPaaS, which among other useful features, allowsthe use of Python withinformulas.
Thanks to@zbanks for the name of the package!Thanks to@rfreling,@adamfinkelstein for discussing ideasbefore I got started on this. Thanks to@GjjvdBurgand collaborators for awesome, awesome contribution to text processing scienceand our Python community withclevercsv
.
This project is licensed under the LGPLv3 license, with the understandingthat importing a Python modular is similar in spirit to dynamically linkingagainst it.
You can use the library
comma
in any project, for any purpose, as longas you provide some acknowledgement to this original project for use ofthe library.If you make improvements to
comma
, you are required to make thosechanges publicly available.
About
Python CSV, and delimiter-spaced files, for humans!