| Comma-separated values | |
|---|---|
| Filename extension | .csv |
| Internet media type | text/csv[1] |
| Uniform Type Identifier (UTI) | public.comma-separated-values-text[2] |
| UTI conformation | public.delimited-values-text[2] |
| Type of format | multi-platform, serial data streams |
| Container for | database information organized as field separated lists |
| Standard | RFC 4180 |
Comma-separated values (CSV) is aplain textdataformat for storingtabular data where thefields (values) of arecord are separated by acomma and each record is a line (i.e.newline separated). CSV is commonly-used insoftware that generally deals with tabular data such as adatabase and aspreadsheet.[3] Benefits cited for using CSV include simplicity of use andhuman readability.[4] CSV is a form ofdelimiter-separated values. ACSV file is a file that contains CSV-formatted data.
CSV is not limited to a particularcharacter encoding.[1] It works just as well withUnicode (i.e.UTF-8 orUTF-16) as withASCII – although particularprograms may have limitations. Unlike many proprietary data formats, CSV data normally survives naïve translation from one character set to another. CSV does not, however, provide a way to indicate the character encoding, so that must be communicated separately.
The CSV format predatespersonal computers by more than a decade. TheIBMFortran (level H extended) compiler underOS/360 supported list-directed ("free-form") input/output, with commas between values, in 1972.[5] List-directed input/output was defined inFORTRAN 77, approved in 1978. List-directed input used commas or spaces for delimiters, so unquoted character strings could not contain commas or spaces.[6]
The term "comma-separated value" and the "CSV" abbreviation were in use by 1983.[7] The manual for theOsborne Executive computer, which bundled theSuperCalc spreadsheet, documents the CSV quoting convention that allows strings to contain embedded commas.[8]
Comma-separated value lists are easier to type (for example intopunched cards) than fixed-column-aligned data, and they were less prone to producing incorrect results if a value was punched one column off from its intended location.
Comma separated files are used for the interchange of database information between machines of two different architectures. The plain-text character of CSV files largely avoids incompatibilities such asbyte-order andword size. The files are largely human-readable, so it is easier to deal with them in the absence of perfect documentation or communication.[9]
The main standardization initiative—transforming "de facto fuzzy definition" into a more precise andde jure one—was in 2005, withRFC 4180, defining CSV as aMIME Content Type.[10] Later, in 2013, some of RFC 4180's deficiencies were tackled by a W3C recommendation.[11]
In 2014IETF publishedRFC 7111 describing the application ofURI fragments to CSV documents. RFC 7111 specifies how row, column, and cell ranges can be selected from a CSV document using position indexes.[12]
In 2015W3C, in an attempt to enhance CSV withformal semantics, publicized the firstdrafts of recommendations for CSV metadata standards, which began asrecommendations in December of the same year.[13]
Casually, CSV refers to data that isplain text and consists of one record per line where each line has the same sequence of fields separated by a comma.[1][14][15] The format is more formally described in the 2005 technical standardRFC 4180 which codifies the CSV format and defines theMIME typetext/csv for the handling of text-based fields. Among its requirements:
Common challenges with CSV include:
In 2011,Open Knowledge Foundation (OKF) and various partners created a data protocols working group, which later evolved into the Frictionless Data initiative. One of the main formats they released was the Tabular Data Package. Tabular Data package was heavily based on CSV, using it as the main data transport format and adding basic type and schema metadata. (CSV lacks any type information to distinguish the string1 from the number 1.)[16] The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV, for example specifying the field separator or quoting rules.[17]
In 2013, theW3C "CSV on the Web" working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats.[18] The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations[19] for modeling "Tabular Data",[13] and enhancing CSV withmetadata andsemantics. While thewell-formedness of CSV data can readily be checked, testing validity and canonical form is less well developed, relative to more precise data formats, such asXML andSQL, which offer richer types and rules-based validation.[20]
CSV is commonly-used fordata exchange and is widely supported by data-orientedapplications. It is often used to move tabular data between programs that natively operate on incompatible data – often in formats that areproprietary or undocumented.[1][21][22] A common scenario is moving data from a database to a spreadsheet which, in general, use completely different formats. Most database systems can export as CSV and most spreadsheet programs can import CSV-formatted data; leveraging CSV as an intermediate format. Every majorecommerce platform provides support for exporting data as a CSV file.[23]
CSV is also used for storing data. Common data science tools such asPandas include the option to export data to CSV for long-term storage.[24] Benefits of CSV for data storage include the simplicity of CSV makes parsing and creating CSV files easy to implement and fast compared to other data formats, human readability making editing or fixing data simpler,[25] and high compressibility leading to smaller data files.[26] Alternatively, CSV does not support more complex data relations and makes no distinction between null and empty values, and in applications where these features are needed other formats are preferred.
More than 200 local, regional, and national data portals, such as those of theUK government and theEuropean Commission, use CSV files with standardizeddata catalogs.[27]
Some applications use CSV as adata interchange format to enhance itsinteroperability, exporting and importing CSV. Others use CSV as an internal format. CSV is supported by almost all spreadsheets and database management systems.
Spreadsheets including AppleNumbers,LibreOffice Calc, andApache OpenOffice Calc. support reading CSV files.Microsoft Excel also supports a dialect of CSV with restrictions in comparison to other spreadsheet software (e.g., as of 2019[update] Excel still cannot export CSV files in the commonly used UTF-8 character encoding, and separator is not enforced to be the comma).LibreOffice Calc CSV importer is actually a more generic delimited text importer, supporting multiple separators at the same time as well as field trimming.
Variousrelational databases support saving query results to a CSV file.PostgreSQL provides theCOPY command, which allows for both saving and loading data to and from a file.COPY(SELECT*FROMarticles)TO'/home/wikipedia/file.csv'(FORMATcsv) saves the content of a tablearticles to a file called/home/wikipedia/file.csv.[28] Some relational databases, when using standard SQL, offerforeign-data wrapper (FDW). For example, PostgreSQL offers theCREATEFOREIGNTABLE[29] andCREATEEXTENSIONfile_fdw[30] commands to configure any variant of CSV. Databases likeApache Hive offer the option to express CSV or.csv.gz as an internal table format.
Programs that work with CSV may have limits on the maximum number of rows CSV files can have. Examples include Microsoft Excel (1,048,576 rows), Apple Numbers (1,000,000 rows), Google Sheets (10,000,000 cells), and OpenOffice and LibreOffice (1,048,576 rows).[31]
For users familiar with the predecessor FORTRAN IV G and H processors, these are the major new language capabilities