- Notifications
You must be signed in to change notification settings - Fork74
Read and write CSV files row-by-row or through Swift's Codable interface.
License
dehesa/CodableCSV
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CodableCSV provides:
- Imperative CSV reader/writer.
- Declarative CSV encoder/decoder.
- Support multiple inputs/outputs:
String
s,Data
blobs,URL
s, andStream
s (commonly used forstdin
). - Support numerous string encodings andByte Order Markers (BOM).
- Extensive configuration: delimiters, escaping scalar, trim strategy, codable strategies, presampling, etc.
- RFC4180 compliant with default configuration and CRLF (
\r\n
) row delimiter. - Multiplatform support with no dependencies (the Swift Standard Library and Foundation are implicit dependencies).
To use this library, you need to:
SPM (Swift Package Manager).
// swift-tools-version:5.1import PackageDescriptionletpackage=Package( /* Your package name, supported platforms, and generated products go here */ dependencies:[.package(url:"https://github.com/dehesa/CodableCSV.git", from:"0.6.7")], targets:[.target(name: /* Your target name here */, dependencies:["CodableCSV"])])
pod 'CodableCSV', '~> 0.6.7'
AddCodableCSV
to your project.
You can choose to add the library through SPM or Cocoapods:
ImportCodableCSV
in the file that needs it.
import CodableCSV
There are two ways to use this library:
- imperatively, as a row-by-row and field-by-field reader/writer.
- declaratively, through Swift's
Codable
interface.
The following types provide imperative control on how to read/write CSV data.
Complete input parsing.
letdata:Data=...letresult=tryCSVReader.decode(input: data)
Once the input is completely parsed, you can choose how to access the decoded data:
letheaders:[String]= result.headers// Access the CSV rows (i.e. raw [String] values)letrows= result.rowsletrow=result[0]// Access the CSV record (i.e. convenience structure over a single row)letrecords= result.recordsletrecord=result[record:0]// Access the CSV columns through indices or header values.letcolumns= result.columnsletcolumn=result[column:0]letcolumn=result[column:"Name"]// Access fields through indices or header values.letfieldB:String=result[row:3, column:2]letfieldA:String?=result[row:2, column:"Age"]
Row-by-row parsing.
letreader=tryCSVReader(input: string){ $0.headerStrategy=.firstLine}letrowA=try reader.readRow()
Parse a row at a time, till
nil
is returned; or exit the scope and the reader will clean up all used memory.// Let's assume the input is:letstring="numA,numB,numC\n1,2,3\n4,5,6\n7,8,9"// The headers property can be accessed at any point after initialization.letheaders:[String]= reader.headers // ["numA", "numB", "numC"]// Keep querying rows till `nil` is received.guardlet rowB=try reader.readRow(), // ["4", "5", "6"]let rowC=try reader.readRow() /* ["7", "8", "9"] */else{...}
Alternatively you can use the
readRecord()
function which also returns the next CSV row, but it wraps the result in a convenience structure. This structure lets you access each field with the header name (as long as theheaderStrategy
is marked with.firstLine
).letreader=tryCSVReader(input: string){ $0.headerStrategy=.firstLine}letheaders= reader.headers // ["numA", "numB", "numC"]letrecordA=try reader.readRecord()letrowA= recordA.row // ["1", "2", "3"]letfieldA=recordA[0] // "1"letfieldB=recordA["numB"] // "2"letrecordB=try reader.readRecord()
Sequence
syntax parsing.letreader=tryCSVReader(input:URL(...), configuration:...)forrowin reader{ // Do something with the row: [String]}
Please note the
Sequence
syntax (i.e.IteratorProtocol
) doesn't throw errors; therefore if the CSV data is invalid, the previous code will crash. If you don't control the CSV data origin, usereadRow()
instead.encoding
(defaultnil
) specify the CSV file encoding.This
String.Encoding
value specify how each underlying byte is represented (e.g..utf8
,.utf32littleEndian
, etc.). If it isnil
, the library will try to figure out the file encoding through the file'sByte Order Marker. If the file doesn't contain a BOM,.utf8
is presumed.delimiters
(default(field: ",", row: "\n")
) specify the field and row delimiters.CSV fields are separated within a row withfield delimiters (commonly a "comma"). CSV rows are separated throughrow delimiters (commonly a "line feed"). You can specify any unicode scalar,
String
value, ornil
for unknown delimiters.escapingStrategy
(default"
) specify the Unicode scalar used to escape fields.CSV fields can be escaped in case they contain privilege characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e.
"
), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.headerStrategy
(default.none
) indicates whether the CSV data has a header row or not.CSV files may contain an optional header row at the very beginning. This configuration value lets you specify whether the file has a header row or not, or whether you want the library to figure it out.
trimStrategy
(default empty set) trims the given characters at the beginning and end of each parsed field.The trim characters are applied for the escaped and unescaped fields. The set cannot include any of the delimiter characters or the escaping scalar. If so, an error will be thrown during initialization.
presample
(defaultfalse
) indicates whether the CSV data should be completely loaded into memory before parsing begins.Loading all data into memory may provide faster iteration for small to medium size files, since you get rid of the overhead of managing an
InputStream
.Complete CSV rows encoding.
letinput=[["numA","numB","name"],["1","2","Marcos"],["4","5","Marine-Anaïs"]]letdata=tryCSVWriter.encode(rows: input)letstring=tryCSVWriter.encode(rows: input, into:String.self)tryCSVWriter.encode(rows: input, into:URL("~/Desktop/Test.csv")!, append:false)
Row-by-row encoding.
letwriter=tryCSVWriter(fileURL:URL("~/Desktop/Test.csv")!, append:false)forrowin input{try writer.write(row: row)}try writer.endEncoding()
Alternatively, you may write directly to a buffer in memory and access its
Data
representation.letwriter=tryCSVWriter{ $0.headers=input[0]}forrowin input.dropFirst(){try writer.write(row: row)}try writer.endEncoding()letresult=try writer.data()
Field-by-field encoding.
letwriter=tryCSVWriter(fileURL:URL("~/Desktop/Test.csv")!, append:false)try writer.write(row:input[0])input[1].forEach{try writer.write(field: field)}try writer.endRow()try writer.write(fields:input[2])try writer.endRow()try writer.endEncoding()
CSVWriter
has a wealth of low-level imperative APIs, that let you write one field, several fields at a time, end a row, write an empty row, etc.Please notice that a CSV requires all rows to have the same amount of fields.
CSVWriter
enforces this by throwing an error when you try to write more the expected amount of fields, or filling a row with empty fields when you callendRow()
but not all fields have been written.delimiters
(default(field: ",", row: "\n")
) specify the field and row delimiters.CSV fields are separated within a row withfield delimiters (commonly a "comma"). CSV rows are separated throughrow delimiters (commonly a "line feed"). You can specify any unicode scalar,
String
value, ornil
for unknown delimiters.escapingStrategy
(default.doubleQuote
) specify the Unicode scalar used to escape fields.CSV fields can be escaped in case they contain privilege characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e.
"
), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.headers
(default[]
) indicates whether the CSV data has a header row or not.CSV files may contain an optional header row at the very beginning. If this configuration value is empty, no header row is written.
encoding
(defaultnil
) specify the CSV file encoding.This
String.Encoding
value specify how each underlying byte is represented (e.g..utf8
,.utf32littleEndian
, etc.). If it isnil
, the library will try to figure out the file encoding through the file'sByte Order Marker. If the file doesn't contain a BOM,.utf8
is presumed.bomStrategy
(default.convention
) indicates whether a Byte Order Marker will be included at the beginning of the CSV representation.The OS convention is that BOMs are never written, except when
.utf16
,.utf32
, or.unicode
string encodings are specified. You could however indicate that you always want the BOM written (.always
) or that is never written (.never
).type
: The error group category.failureReason
: Explanation of what went wrong.helpAnchor
: Advice on how to solve the problem.errorUserInfo
: Arguments associated with the operation that threw the error.underlyingError
: Optional underlying error, which provoked the operation to fail (most of the time isnil
).localizedDescription
: Returns a human readable string with all the information contained in the error.
CSVReader
ACSVReader
parses CSV data from a given input (String
,Data
,URL
, orInputStream
) and returns CSV rows as aString
s array.CSVReader
can be used at ahigh-level, in which case it parses an input completely; or at alow-level, in which each row is decoded when requested.
CSVReader
accepts the following configuration properties:
The configuration values are set during initialization and can be passed to theCSVReader
instance through a structure or with a convenience closure syntax:
letreader=CSVReader(input:...){ $0.encoding=.utf8 $0.delimiters.row="\r\n" $0.headerStrategy=.firstLine $0.trimStrategy=.whitespaces}
CSVWriter
ACSVWriter
encodes CSV information into a specified target (i.e. aString
, orData
, or a file). It can be used at ahigh-level, by encoding completely a prepared set of information; or at alow-level, in which case rows or fields can be written individually.
CSVWriter
accepts the following configuration properties:
The configuration values are set during initialization and can be passed to theCSVWriter
instance through a structure or with a convenience closure syntax:
letwriter=CSVWriter(fileURL:...){ $0.delimiters.row="\r\n" $0.headers=["Name","Age","Pet"] $0.encoding=.utf8 $0.bomStrategy=.never}
CSVError
Many ofCodableCSV
's imperative functions may throw errors due to invalid configuration values, invalid CSV input, file stream failures, etc. All these throwing operations exclusively throwCSVError
s that can be easily caught withdo
-catch
clause.
do{letwriter=tryCSVWriter()forrowin customData{try writer.write(row: row)}}catchlet error{print(error)}
CSVError
adopts Swift Evolution'sSE-112 protocols andCustomDebugStringConvertible
. The error's properties provide rich commentary explaining what went wrong and indicate how to fix the problem.
You can get all the information by simply printing the error or calling thelocalizedDescription
property on a properly castedCSVError<CSVReader>
orCSVError<CSVWriter>
.
The encoders/decoders provided by this library let you use Swift'sCodable
declarative approach to encode/decode CSV data.
nilStrategy
(default:.empty
) indicates how thenil
concept (absence of value) is represented on the CSV.boolStrategy
(default:.insensitive
) defines how strings are decoded toBool
values.nonConformingFloatStrategy
(default.throw
) specifies how to handle non-numbers (e.g.NaN
and infinity).decimalStrategy
(default.locale
) indicates how strings are decoded toDecimal
values.dateStrategy
(default.deferredToDate
) specify how strings are decoded toDate
values.dataStrategy
(default.base64
) indicates how strings are decoded toData
values.bufferingStrategy
(default.keepAll
) controls the behavior ofKeyedDecodingContainer
s.Selecting a buffering strategy affects the decoding performance and the amount of memory used during the decoding process. For more information check the README'sTips using
Codable
section and theStrategy.DecodingBuffer
definition.nilStrategy
(default:.empty
) indicates how thenil
concept (absence of value) is represented on the CSV.boolStrategy
(default:.deferredToString
) defines how Boolean values are encoded toString
values.nonConformingFloatStrategy
(default.throw
) specifies how to handle non-numbers (i.e.NaN
and infinity).decimalStrategy
(default.locale
) indicates how decimal numbers are encoded toString
values.dateStrategy
(default.deferredToDate
) specify how dates are encoded toString
values.dataStrategy
(default.base64
) indicates how data blobs are encoded toString
values.bufferingStrategy
(default.keepAll
) controls the behavior ofKeyedEncodingContainer
s.Selecting a buffering strategy directly affect the encoding performance and the amount of memory used during the process. For more information check this README'sTips using
Codable
section and theStrategy.EncodingBuffer
definition.
CSVDecoder
CSVDecoder
transforms CSV data into a Swift type conforming toDecodable
. The decoding process is very simple and it only requires creating a decoding instance and call itsdecode
function passing theDecodable
type and the input data.
letdecoder=CSVDecoder()letresult=try decoder.decode(CustomType.self, from: data)
CSVDecoder
can decode CSVs represented as aData
blob, aString
, an actual file in the file system, or anInputStream
(e.g.stdin
).
letdecoder=CSVDecoder{ $0.bufferingStrategy=.sequential}letcontent=try decoder.decode([Student].self, from:URL("~/Desktop/Student.csv"))
If you are dealing with a big CSV file, it is preferred to used direct file decoding, a.sequential
or.unrequested
buffering strategy, and setpresampling to false; since then memory usage is drastically reduced.
The decoding process can be tweaked by specifying configuration values at initialization time.CSVDecoder
accepts the same configuration values asCSVReader
plus the following ones:
The configuration values can be set duringCSVDecoder
initialization or at any point before thedecode
function is called.
letdecoder=CSVDecoder{ $0.encoding=.utf8 $0.delimiters.field="\t" $0.headerStrategy=.firstLine $0.bufferingStrategy=.keepAll $0.decimalStrategy=.custom({(decoder)inletvalue=tryFloat(from: decoder)returnDecimal(value)})}
CSVDecoder.Lazy
A CSV input can be decodedon demand (i.e. row-by-row) with the decoder'slazy(from:)
function.
letdecoder=CSVDecoder(configuration: config).lazy(from: fileURL)letstudent1=try decoder.decodeRow(Student.self)letstudent2=try decoder.decodeRow(Student.self)
CSVDecoder.Lazy
conforms to Swift'sSequence
protocol, letting you use functionality such asmap()
,allSatisfy()
, etc. Please note,CSVDecoder.Lazy
cannot be used for repeated access; Itconsumes the input CSV.
letdecoder=CSVDecoder().lazy(from: fileData)letstudents=try decoder.map{try $0.decode(Student.self)}
A nice benefit of using thelazy operation, is that it lets you switch how a row is decoded at any point. For example:
letdecoder=CSVDecoder().lazy(from: fileString)// The first 100 rows are students.letstudents=(0..<100).map{ _intry decoder.decode(Student.self)}// The second 100 rows are teachers.letteachers=(100..<110).map{ _intry decoder.decode(Teacher.self)}
SinceCSVDecoder.Lazy
exclusively provides sequential access; setting the buffering strategy to.sequential
will reduce the decoder's memory usage.
letdecoder=CSVDecoder{ $0.headerStrategy=.firstLine $0.bufferingStrategy=.sequential}.lazy(from: fileURL)
CSVEncoder
CSVEncoder
transforms Swift types conforming toEncodable
into CSV data. The encoding process is very simple and it only requires creating an encoding instance and call itsencode
function passing theEncodable
value.
letencoder=CSVEncoder()letdata=try encoder.encode(value, into:Data.self)
TheEncoder
'sencode()
function creates a CSV file as aData
blob, aString
, or an actual file in the file system.
letencoder=CSVEncoder{ $0.headers=["name","age","hasPet"]}try encoder.encode(value, into:URL("~/Desktop/Students.csv"))
If you are dealing with a big CSV content, it is preferred to use direct file encoding and a.sequential
or.assembled
buffering strategy, since then memory usage is drastically reduced.
The encoding process can be tweaked by specifying configuration values.CSVEncoder
accepts the same configuration values asCSVWriter
plus the following ones:
The configuration values can be set duringCSVEncoder
initialization or at any point before theencode
function is called.
letencoder=CSVEncoder{ $0.headers=["name","age","hasPet"] $0.delimiters=(field:";", row:"\r\n") $0.dateStrategy=.iso8601 $0.bufferingStrategy=.sequential $0.floatStrategy=.convert(positiveInfinity:"∞", negativeInfinity:"-∞", nan:"≁") $0.dataStrategy=.custom({(data, encoder)inletstring=customTransformation(data)varcontainer=try encoder.singleValueContainer()try container.encode(string)})}
The
.headers
configuration is required if you are using keyed encoding container.
CSVEncoder.Lazy
A series of codable types (representing CSV rows) can be encodedon demand with the encoder'slazy(into:)
function.
letencoder=CSVEncoder().lazy(into:Data.self)forstudentin students{try encoder.encodeRow(student)}letdata=try encoder.endEncoding()
CallendEncoding()
once there is no more values to be encoded. The function will return the encoded CSV.
letencoder=CSVEncoder().lazy(into:String.self)students.forEach{try encoder.encode($0)}letstring=try encoder.endEncoding()
A nice benefit of using thelazy operation, is that it lets you switch how a row is encoded at any point. For example:
letencoder=CSVEncoder(configuration: config).lazy(into: fileURL)students.forEach{try encoder.encode($0)}teachers.forEach{try encoder.encode($0)}try encoder.endEncoding()
SinceCSVEncoder.Lazy
exclusively provides sequential encoding; setting the buffering strategy to.sequential
will reduce the encoder's memory usage.
letencoder=CSVEncoder{ $0.bufferingStrategy=.sequential}.lazy(into:String.self)
Codable
is fairly easy to use and most Swift standard library types already conform to it. However, sometimes it is tricky to get custom types to comply toCodable
for specific functionality.
At initialization time, passing the
Configuration
structure to the initializer.varconfig=CSVDecoder.Configuration()config.nilStrategy=.emptyconfig.decimalStrategy=.locale(.current)config.dataStrategy=.base64config.bufferingStrategy=.sequentialconfig.trimStrategy=.whitespacesconfig.encoding=.utf16config.delimiters.row="\r\n"letdecoder=CSVDecoder(configuration: config)
Alternatively, there are convenience initializers accepting a closure with a
inout Configuration
value.letdecoder=CSVDecoder{ $0.nilStrategy=.empty $0.decimalStrategy=.locale(.current) // and so on and so forth}
CSVEncoder
andCSVDecoder
implement@dynamicMemberLookup
exclusively for their configuration values. Therefore you can set configuration values after initialization or after a encoding/decoding process has been performed.letdecoder=CSVDecoder()decoder.bufferingStrategy=.sequentialdecoder.decode([Student].self, from: url1)decoder.bufferingStrategy=.keepAlldecoder.decode([Pets].self, from: url2)
Basic adoption.
When a custom type conforms toCodable
, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support forJSON and Property Lists and the community provide many other formats, such as:YAML,XML,BSON, and CSV (through this library).
Usually a CSV represents a long list ofentities. The following is a simple example representing a list of students.
letstring=""" name,age,hasPet John,22,true Marine,23,false Alta,24,true"""
Astudent can be represented as a structure:
structStudent:Codable{varname:Stringvarage:IntvarhasPet:Bool}
To decode the list of students, create a decoder and calldecode
on it passing the CSV sample.
letdecoder=CSVDecoder{ $0.headerStrategy=.firstLine}letstudents=try decoder.decode([Student].self, from: string)
The inverse process (from Swift to CSV) is very similar (and simple).
letencoder=CSVEncoder{ $0.headers=["name","age","hasPet"]}letnewData=try encoder.encode(students)
Specific behavior for CSV data.
When encoding/decoding CSV data, it is important to keep several points in mind:
Codable
's automatic synthesis requires CSV files with a headers row.
Codable
is able to synthesizeinit(from:)
andencode(to:)
for your custom types when all its members/properties conform toCodable
. This automatic synthesis create a hiddenCodingKeys
enumeration containing all your property names.
During decoding,CSVDecoder
tries to match the enumeration string values with a field position within a row. For this to work the CSV data must contain aheaders row with the property names. If your CSV doesn't contain aheaders row, you can specify coding keys with integer values representing the field index.
structStudent:Codable{varname:Stringvarage:IntvarhasPet:BoolprivateenumCodingKeys:Int,CodingKey{case name=0case age=1case hasPet=2}}
Using integer coding keys has the added benefit of better encoder/decoder performance. By explicitly indicating the field index, you let the decoder skip the functionality of matching coding keys string values to headers.
A CSV is a long list of rows/records.
CSV formatted data is commonly used with flat hierarchies (e.g. a list of students, a list of car models, etc.). Nested structures, such as the ones found in JSON files, are not supported by default in CSV implementations (e.g. a list of users, where each user has a list of services she uses, and each service has a list of the user's configuration values).
You can support complex structures in CSV, but you would have to flatten the hierarchy in a single model or build a custom encoding/decoding process. This process would make sure there is always a maximum of two keyed/unkeyed containers.
As an example, we can create a nested structure for a school with students who own pets.
structSchool:Codable{letstudents:[Student]}structStudent:Codable{varname:Stringvarage:Intvarpet:Pet}structPet:Codable{varnickname:Stringvargender:GenderenumGender:Codable{case male, female}}
By default the previous example wouldn't work. If you want to keep the nested structure, you need to overwrite the custominit(from:)
implementation (to supportDecodable
).
extensionSchool{init(from decoder:Decoder)throws{varcontainer=try decoder.unkeyedContainer()while !container.isAtEnd{self.student.append(try container.decode(Student.self))}}}extensionStudent{init(from decoder:Decoder)throws{varcontainer=try decoder.container(keyedBy:CustomKeys.self)self.name=try container.decode(String.self, forKey:.name)self.age=try container.decode(Int.self, forKey:.age)self.pet=try decoder.singleValueContainer.decode(Pet.self)}}extensionPet{init(from decoder:Decoder)throws{varcontainer=try decoder.container(keyedBy:CustomKeys.self)self.nickname=try container.decode(String.self, forKey:.nickname)self.gender=try container.decode(Gender.self, forKey:.gender)}}extensionPet.Gender{init(from decoder:Decoder)throws{varcontainer=try decoder.singleValueContainer()self=try container.decode(Int.self)==1?.male:.female}}private CustomKeys: Int, CodingKey{ case name=0 case age=1 case nickname=2 case gender=3}
You could have avoided building the initializers overhead by defining a flat structure such as:
structStudent:Codable{varname:Stringvarage:Intvarnickname:Stringvargender:GenderenumGender:Int,Codable{case male=1case female=2}}
Encoding/decoding strategies.
SE167 proposal introduced to Foundation JSON and PLIST encoders/decoders. This proposal also featured encoding/decoding strategies as a new way to configure the encoding/decoding process.CodableCSV
continues thistradition and mirrors such strategies including some new ones specific to the CSV file format.
To configure the encoding/decoding process, you need to set the configuration values of theCSVEncoder
/CSVDecoder
before calling theencode()
/decode()
functions. There are two ways to set configuration values:
The strategies labeled with.custom
let you insert behavior into the encoding/decoding process without forcing you to manually conform toinit(from:)
andencode(to:)
. When set, they will reference the targeted type for the whole process. For example, if you want to encode a CSV file where empty fields are marked with the wordnull
(for some reason). You could do the following:
letdecoder=CSVDecoder()decoder.nilStrategy=.custom({(encoder)invarcontainer= encoder.singleValueContainer()try container.encode("null")})
Type-safe headers row.
You can generate type-safe name headers using Swift introspection tools (i.e.Mirror
) or explicitly defining theCodingKey
enum withString
raw value conforming toCaseIterable
.
structStudent{varname:Stringvarage:IntvarhasPet:BoolenumCodingKeys:String,CodingKey,CaseIterable{case name, age, hasPet}}
Then configure your encoder with explicit headers.
letencoder=CSVEncoder{ $0.headers=Student.CodingKeys.allCases.map{ $0.rawValue}}
Performance advices.
#warning("TODO:")
The library has been heavily documented and any contribution is welcome. Check the smallHow to contribute document or take a look at theGithub projects for a more in-depth roadmap.
IfCodableCSV
is not of your liking, the Swift community offers other CSV solutions:
- CSV.swift contains an imperative CSV reader/writer and alazy row decoder and adheres to theRFC4180 standard.
- SwiftCSV is a well-tested parse-only library which loads the whole CSV in memory (not intended for large files).
- CSwiftV is a parse-only library which loads the CSV in memory and parses it in a single go (no imperative reading).
- CSVImporter is an asynchronous parse-only library with support for big CSV files (incremental loading).
- SwiftCSVExport reads/writes CSV imperatively with Objective-C support.
- swift-csv offers an imperative CSV reader/writer based on Foundation's streams.
There are many good tools outside the Swift community. Since writing them all would be a hard task, I will just point you to the greatAwesomeCSV github repo. There are a lot of treasures to be found there.
About
Read and write CSV files row-by-row or through Swift's Codable interface.