Movatterモバイル変換

Comma-separated values

From Wikipedia, the free encyclopedia

Text format for tabular data using a comma between fields

Comma-separated values

Filename extension	`.csv`
Internet media type	`text/csv`^[1]
Uniform Type Identifier (UTI)	public.comma-separated-values-text^[2]
UTI conformation	public.delimited-values-text^[2]
Type of format	multi-platform, serial data streams
Container for	database information organized as field separated lists
Standard	RFC 4180

Comma-separated values (CSV) is aplain text data format for storingtabular data where thefields (values) of arecord are separated by acomma and each record is a line (i.e.newline separated). CSV is commonly-used insoftware that generally deals with tabular data such as adatabase and aspreadsheet.^[3] Benefits cited for using CSV include simplicity of use andhuman readability.^[4] CSV is a form ofdelimiter-separated values. ACSV file is a file that contains CSV-formatted data.

CSV is not limited to a particularcharacter encoding.^[1] It works just as well withUnicode (i.e.UTF-8 orUTF-16) as withASCII – although particularprograms may have limitations. Unlike many proprietary data formats, CSV data normally survives naïve translation from one character set to another. CSV does not, however, provide a way to indicate the character encoding, so that must be communicated separately.

History

[edit]

The CSV format predatespersonal computers by more than a decade. TheIBM Fortran (level H extended) compiler underOS/360 supported list-directed ("free-form") input/output, with commas between values, in 1972.^[5] List-directed input/output was defined inFORTRAN 77, approved in 1978. List-directed input used commas or spaces for delimiters, so unquoted character strings could not contain commas or spaces.^[6]

The term "comma-separated value" and the "CSV" abbreviation were in use by 1983.^[7] The manual for theOsborne Executive computer, which bundled theSuperCalc spreadsheet, documents the CSV quoting convention that allows strings to contain embedded commas.^[8]

Comma-separated value lists are easier to type (for example intopunched cards) than fixed-column-aligned data, and they were less prone to producing incorrect results if a value was punched one column off from its intended location.

Comma separated files are used for the interchange of database information between machines of two different architectures. The plain-text character of CSV files largely avoids incompatibilities such asbyte-order andword size. The files are largely human-readable, so it is easier to deal with them in the absence of perfect documentation or communication.^[9]

The main standardization initiative—transforming "de facto fuzzy definition" into a more precise andde jure one—was in 2005, withRFC 4180, defining CSV as aMIME Content Type.^[10] Later, in 2013, some of RFC 4180's deficiencies were tackled by a W3C recommendation.^[11]

In 2014IETF publishedRFC 7111 describing the application ofURI fragments to CSV documents. RFC 7111 specifies how row, column, and cell ranges can be selected from a CSV document using position indexes.^[12]

In 2015W3C, in an attempt to enhance CSV withformal semantics, publicized the firstdrafts of recommendations for CSV metadata standards, which began asrecommendations in December of the same year.^[13]

Specification

[edit]

Casually, CSV refers to data that isplain text and consists of one record per line where each line has the same sequence of fields separated by a comma.^[1]^[14]^[15] The format is more formally described in the 2005 technical standardRFC 4180 which codifies the CSV format and defines theMIME typetext/csv for the handling of text-based fields. Among its requirements:

A line is terminated per MS-DOS-style: carriage return and line feed (CR/LF) sequence
A line terminator is optional for the last line
The data can start with a header record but with no way to test whether the first line is, in fact, a header, care is required when importing
Each record should contain the same number of fields
A field containing a comma, double quote or line terminator character should be enclosed in double quotes
Any field may be enclosed in double quotes
If a field is enclosed in double quotes, then a double quote embedded in the field must be represented by a sequence of two double quotes

Common challenges with CSV include:

Programs may not support line terminator characters within a field even when properly quoted
Programs may confuse a header line with data or interpret the first data line as a header
Double quotes in a field may not be parsed correctly

In 2011,Open Knowledge Foundation (OKF) and various partners created a data protocols working group, which later evolved into the Frictionless Data initiative. One of the main formats they released was the Tabular Data Package. Tabular Data package was heavily based on CSV, using it as the main data transport format and adding basic type and schema metadata. (CSV lacks any type information to distinguish the string1 from the number 1.)^[16] The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV, for example specifying the field separator or quoting rules.^[17]

In 2013, theW3C "CSV on the Web" working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats.^[18] The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations^[19] for modeling "Tabular Data",^[13] and enhancing CSV withmetadata andsemantics. While thewell-formedness of CSV data can readily be checked, testing validity and canonical form is less well developed, relative to more precise data formats, such asXML andSQL, which offer richer types and rules-based validation.^[20]

Applications

[edit]

CSV is commonly-used fordata exchange and is widely supported by data-orientedapplications. It is often used to move tabular data between programs that natively operate on incompatible data – often in formats that areproprietary or undocumented.^[1]^[21]^[22] A common scenario is moving data from a database to a spreadsheet which, in general, use completely different formats. Most database systems can export as CSV and most spreadsheet programs can import CSV-formatted data; leveraging CSV as an intermediate format. Every majorecommerce platform provides support for exporting data as a CSV file.^[23]

CSV is also used for storing data. Common data science tools such asPandas include the option to export data to CSV for long-term storage.^[24] Benefits of CSV for data storage include the simplicity of CSV makes parsing and creating CSV files easy to implement and fast compared to other data formats, human readability making editing or fixing data simpler,^[25] and high compressibility leading to smaller data files.^[26] Alternatively, CSV does not support more complex data relations and makes no distinction between null and empty values, and in applications where these features are needed other formats are preferred.

More than 200 local, regional, and national data portals, such as those of theUK government and theEuropean Commission, use CSV files with standardizeddata catalogs.^[27]

Some applications use CSV as adata interchange format to enhance itsinteroperability, exporting and importing CSV. Others use CSV as an internal format. CSV is supported by almost all spreadsheets and database management systems.

Spreadsheets including AppleNumbers,LibreOffice Calc, andApache OpenOffice Calc. support reading CSV files.Microsoft Excel also supports a dialect of CSV with restrictions in comparison to other spreadsheet software (e.g., as of 2019^[update] Excel still cannot export CSV files in the commonly used UTF-8 character encoding, and separator is not enforced to be the comma).LibreOffice Calc CSV importer is actually a more generic delimited text importer, supporting multiple separators at the same time as well as field trimming.

Variousrelational databases support saving query results to a CSV file.PostgreSQL provides theCOPY command, which allows for both saving and loading data to and from a file.COPY(SELECT*FROMarticles)TO'/home/wikipedia/file.csv'(FORMATcsv) saves the content of a tablearticles to a file called/home/wikipedia/file.csv.^[28] Some relational databases, when using standard SQL, offerforeign-data wrapper (FDW). For example, PostgreSQL offers theCREATEFOREIGNTABLE^[29] andCREATEEXTENSIONfile_fdw^[30] commands to configure any variant of CSV. Databases likeApache Hive offer the option to express CSV or.csv.gz as an internal table format.

Programs that work with CSV may have limits on the maximum number of rows CSV files can have. Examples include Microsoft Excel (1,048,576 rows), Apple Numbers (1,000,000 rows), Google Sheets (10,000,000 cells), and OpenOffice and LibreOffice (1,048,576 rows).^[31]

References

[edit]

^^a ^b ^c ^dShafranovich, Y. (October 2005).Common Format and MIME Type for CSV Files.IETF. p. 1.doi:10.17487/RFC4180.RFC 4180.
^^a ^b"commaSeparatedText".Apple Developer Documentation: Uniform Type Identifiers.Apple Inc.Archived from the original on 2023-05-22. Retrieved2023-05-22.
^"Import or export text (.txt or .csv) files".Microsoft Support. Retrieved2023-08-16.
^"What is a CSV file: A comprehensive guide".flatfile.com. Retrieved2024-10-28.
^IBM FORTRAN Program Products for OS and the CMS Component of VM/370 General Information(PDF) (first ed.), July 1972, p. 17, GC28-6884-0,archived(PDF) from the original on March 4, 2016, retrievedFebruary 5, 2016,For users familiar with the predecessor FORTRAN IV G and H processors, these are the major new language capabilities
^"List-Directed I/O",Fortran 77 Language Reference, Oracle,archived from the original on 2021-02-26, retrieved2012-10-26
^"SuperCalc², spreadsheet package for IBM, CP/M". RetrievedDecember 11, 2017.
^"Comma-Separated-Value Format File Structure". 1983. RetrievedDecember 11, 2017.
^"CSV, Comma Separated Values (RFC 4180)".Library of Congress. RetrievedSeptember 22, 2025.
^Common Format and MIME Type for Comma-Separated Values (CSV) Files.doi:10.17487/RFC4180.RFC 4180. RetrievedDecember 22, 2020.
^Seesparql11-results-csv-tsv, the first W3C recommendation scoped in CSV and filling some of RFC 4180's deficiencies.
^URI Fragment Identifiers for the text/csv Media Type.doi:10.17487/RFC7111.RFC 7111. RetrievedDecember 22, 2020.
^^a ^b"Model for Tabular Data and Metadata on the Web". 17 December 2015. RetrievedMarch 23, 2016. (W3C Recommendation)
^"Comma Separated Values (CSV) Standard File Format". Edoceo, Inc.Archived from the original on July 14, 2020. RetrievedJune 4, 2014.
^*Creativyst (2010),How To: The Comma Separated Value (CSV) File Format, creativyst.com,archived from the original on April 4, 2021, retrievedMay 24, 2010
^"Tabular Data Package".Frictionless Data Specs.
^"CSV Dialect".Frictionless Data Specs.
^"CSV on the Web Working Group".W3C CSV WG. 2013. Retrieved2015-04-22.
^"CSV on the Web Repository".GitHub. (on GitHub)
^"Rules Or Schemas". CsvPath Project. 2024. Retrieved2025-02-13.
^"CSV - Comma Separated Values". Archived fromthe original on 2021-03-07. Retrieved2017-12-02.
^"CSV Files".Archived from the original on April 30, 2021. RetrievedJune 4, 2014.
^"CSV Supported Ecommerce Platforms".RFM Calc. Retrieved2025-03-09.
^"pandas.DataFrame.to_csv — pandas 2.0.3 documentation".pandas.pydata.org. Retrieved2023-08-16.
^"CSV Format: History, Advantages and Why It Is Still Popular".ByteScout. 2021-09-15. Retrieved2023-08-16.
^"Comparison of different file formats in Big Data".www.adaltas.com. 2020-07-23. Retrieved2023-08-16.
^Mahmud, S M Hasan; Hossin, Md Altab; Jahan, Hosney; Noori, Sheak Rashed Haider; Bhuiyan, Touhid (2018).CSV-ANNOTATE: Generate annotated tables from CSV file. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). IEEE. pp. 71–75.doi:10.1109/ICAIBD.2018.8396169.ISBN 978-1-5386-6987-7.
^"Documentation: 14: COPY". PostgreSQL. Retrieved2024-05-12.
^"Documentation: 14: F.35. postgres_fdw". PostgreSQL. 2022-02-10. Retrieved2022-03-04.
^"Documentation: 14: F.14. file_fdw". PostgreSQL. 2022-02-10. Retrieved2022-03-04.
^"Understanding CSV and row limits". Archived fromthe original on January 15, 2021. RetrievedFeb 28, 2021.

v t e Data exchange formats
Human readable	Atom CSV EDIFACT JSON Web Encryption Web Token Web Signature Property list RDF Rebol TOML TOON XML YAML
Binary	AMF Ascii85 ASN.1 SMI Avro Base32 Base64 Bencode BSON UBJSON Cap'n Proto CBOR FlatBuffers MessagePack Property list Protocol Buffers Thrift Cyphal DSDL XDR uuencode yEnc
Comparison of data-serialization formats

Movatterモバイル変換

Comma-separated values

History

Specification

Applications

See also

References

Further reading