Movatterモバイル変換


[0]ホーム

URL:


— FREE Email Series —

🐍 Python Tricks 💌

Python Tricks Dictionary Merge

🔒 No spam. Unsubscribe any time.

Browse TopicsGuided Learning Paths
Basics Intermediate Advanced
apibest-practicescareercommunitydatabasesdata-sciencedata-structuresdata-vizdevopsdjangodockereditorsflaskfront-endgamedevguimachine-learningnumpyprojectspythontestingtoolsweb-devweb-scraping

Table of Contents

Recommended Video Course
Reading and Writing CSV Files

Python CSV Parsing

Reading and Writing CSV Files in Python

byJon FincherReading time estimate 13mintermediatedata-science

Table of Contents

Remove ads

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Reading and Writing CSV Files

Let’s face it: you need to get information into and out of your programs through more than just the keyboard and console. Exchanging information through text files is a common way to share info between programs. One of the most popular formats for exchanging data is the CSV format. But how do you use it?

Let’s get one thing clear: you don’t have to (and you won’t) build your own CSV parser from scratch. There are several perfectly acceptable libraries you can use. The Pythoncsv library will work for most cases. If your work requires lots of data or numerical analysis, thepandas library has CSV parsing capabilities as well, which should handle the rest.

In this article, you’ll learn how to read, process, and parse CSV from text files using Python. You’ll see how CSV files work, learn the all-importantcsv library built into Python, and see how CSV parsing works using thepandas library.

So let’s get started!

Free Download:Get a sample chapter from Python Basics: A Practical Introduction to Python 3 to see how you can go from beginner to intermediate in Python with a complete curriculum, up-to-date for Python 3.8.

Take the Quiz: Test your knowledge with our interactive “Reading and Writing CSV Files in Python” quiz. You’ll receive a score upon completion to help you track your learning progress:


Python CSV Parsing

Interactive Quiz

Reading and Writing CSV Files in Python

This quiz will check your understanding of what a CSV file is and the different ways to read and write to them in Python.

What Is a CSV File?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printableASCII orUnicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. Here’s what that structure looks like:

CSV
column 1 name,column 2 name, column 3 namefirst row data 1,first row data 2,first row data 3second row data 1,second row data 2,second row data 3...

Notice how each piece of data is separated by a comma. Normally, the first line identifies each piece of data—in other words, the name of a data column. Every subsequent line after that is actual data and is limited only by file size constraints.

In general, the separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters. Properly parsing a CSV file requires us to know which delimiter is being used.

Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

CSV files are very easy to work with programmatically. Any language that supports text file input and string manipulation (like Python) can work with CSV files directly.

Parsing CSV Files With Python’s Built-in CSV Library

Thecsv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. Thecsv library contains objects and other code to read, write, and process data from and to CSV files.

Reading CSV Files Withcsv

Reading from a CSV file is done using thereader object. The CSV file is opened as a text file with Python’s built-inopen() function, which returns a file object. This is then passed to thereader, which does the heavy lifting.

Here’s theemployee_birthday.csv file:

CSV
name,department,birthday monthJohn Smith,Accounting,NovemberErica Meyers,IT,March

Here’s code to read it:

Python
importcsvwithopen('employee_birthday.csv')ascsv_file:csv_reader=csv.reader(csv_file,delimiter=',')line_count=0forrowincsv_reader:ifline_count==0:print(f'Column names are{", ".join(row)}')line_count+=1else:print(f'\t{row[0]} works in the{row[1]} department, and was born in{row[2]}.')line_count+=1print(f'Processed{line_count} lines.')

This results in the following output:

Shell
Column names are name, department, birthday month    John Smith works in the Accounting department, and was born in November.    Erica Meyers works in the IT department, and was born in March.Processed 3 lines.

Each row returned by thereader is a list ofString elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

Reading CSV Files Into a Dictionary Withcsv

Rather than deal with a list of individualString elements, you can read CSV data directly into a dictionary (technically, anOrdered Dictionary) as well.

Again, our input file,employee_birthday.csv is as follows:

CSV
name,department,birthday monthJohn Smith,Accounting,NovemberErica Meyers,IT,March

Here’s the code to read it in as adictionary this time:

Python
importcsvwithopen('employee_birthday.csv',mode='r')ascsv_file:csv_reader=csv.DictReader(csv_file)line_count=0forrowincsv_reader:ifline_count==0:print(f'Column names are{", ".join(row)}')line_count+=1print(f'\t{row["name"]} works in the{row["department"]} department, and was born in{row["birthday month"]}.')line_count+=1print(f'Processed{line_count} lines.')

This results in the same output as before:

Shell
Column names are name, department, birthday month    John Smith works in the Accounting department, and was born in November.    Erica Meyers works in the IT department, and was born in March.Processed 3 lines.

Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting thefieldnames optional parameter to a list containing them.

Optional Python CSVreader Parameters

Thereader object can handle different styles of CSV files by specifyingadditional parameters, some of which are shown below:

  • delimiter specifies the character used to separate each field. The default is the comma (',').

  • quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').

  • escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

These parameters deserve some more explanation. Suppose you’re working with the followingemployee_addresses.csv file:

CSV
name,address,date joinedjohn smith,1132 Anywhere Lane Hoboken NJ, 07030,Jan 4erica meyers,1234 Smith Lane Hoboken NJ, 07030,March 2

This CSV file contains three fields:name,address, anddate joined, which are delimited by commas. The problem is that the data for theaddress field also contains a comma to signify the zip code.

There are three different ways to handle this situation:

  • Use a different delimiter
    That way, the comma can safely be used in the data itself. You use thedelimiter optional parameter to specify the new delimiter.

  • Wrap the data in quotes
    The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with thequotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.

  • Escape the delimiter characters in the data
    Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using theescapechar optional parameter.

Writing CSV Files Withcsv

You can also write to a CSV file using awriter object and the.write_row() method:

Python
importcsvwithopen('employee_file.csv',mode='w')asemployee_file:employee_writer=csv.writer(employee_file,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)employee_writer.writerow(['John Smith','Accounting','November'])employee_writer.writerow(['Erica Meyers','IT','March'])

Thequotechar optional parameter tells thewriter which character to use to quote fields when writing. Whether quoting is used or not, however, is determined by thequoting optional parameter:

  • Ifquoting is set tocsv.QUOTE_MINIMAL, then.writerow() will quote fields only if they contain thedelimiter or thequotechar. This is the default case.
  • Ifquoting is set tocsv.QUOTE_ALL, then.writerow() will quote all fields.
  • Ifquoting is set tocsv.QUOTE_NONNUMERIC, then.writerow() will quote all fields containing text data and convert all numeric fields to thefloat data type.
  • Ifquoting is set tocsv.QUOTE_NONE, then.writerow() will escape delimiters instead of quoting them. In this case, you also must provide a value for theescapechar optional parameter.

Reading the file back in plain text shows that the file is created as follows:

CSV
John Smith,Accounting,NovemberErica Meyers,IT,March

Writing CSV File From a Dictionary Withcsv

Since you can read our data into a dictionary, it’s only fair that you should be able to write it out from a dictionary as well:

Python
importcsvwithopen('employee_file2.csv',mode='w')ascsv_file:fieldnames=['emp_name','dept','birth_month']writer=csv.DictWriter(csv_file,fieldnames=fieldnames)writer.writeheader()writer.writerow({'emp_name':'John Smith','dept':'Accounting','birth_month':'November'})writer.writerow({'emp_name':'Erica Meyers','dept':'IT','birth_month':'March'})

UnlikeDictReader, thefieldnames parameter is required when writing a dictionary. This makes sense, when you think about it: without a list offieldnames, theDictWriter can’t know which keys to use to retrieve values from your dictionaries. It also uses the keys infieldnames to write out the first row as column names.

The code above generates the following output file:

CSV
emp_name,dept,birth_monthJohn Smith,Accounting,NovemberErica Meyers,IT,March

Parsing CSV Files With thepandas Library

Of course, the Python CSV library isn’t the only game in town.Reading CSV files is possible inpandas as well. It is highly recommended if you have a lot of data to analyze.

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures.pandas is available for all Python installations, but it is a key part of theAnaconda distribution and works extremely well inJupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Installingpandas and its dependencies inAnaconda is easily done:

Shell
$condainstallpandas

As is usingpip/pipenv for other Python installations:

Shell
$pipinstallpandas

We won’t delve into the specifics of howpandas works or how to use it. For an in-depth treatment on usingpandas to read and analyze large data sets, check outShantnu Tiwari’s superb article onworking with large Excel files in pandas.

Reading CSV Files Withpandas

To show some of the power ofpandas CSV capabilities, I’ve created a slightly more complicated file to read, calledhrdata.csv. It contains data on company employees:

CSV
Name,Hire Date,Salary,Sick Days remainingGraham Chapman,03/15/14,50000.00,10John Cleese,06/01/15,65000.00,8Eric Idle,05/12/14,45000.00,10Terry Jones,11/01/13,70000.00,3Terry Gilliam,08/12/14,48000.00,7Michael Palin,05/23/13,66000.00,8

Reading the CSV into apandasDataFrame is quick and straightforward:

Python
importpandasdf=pandas.read_csv('hrdata.csv')print(df)

That’s it: three lines of code, and only one of them is doing the actual work.pandas.read_csv() opens, analyzes, and reads the CSV file provided, and stores the data in aDataFrame. Printing theDataFrame results in the following output:

Shell
             Name Hire Date   Salary  Sick Days remaining0  Graham Chapman  03/15/14  50000.0                   101     John Cleese  06/01/15  65000.0                    82       Eric Idle  05/12/14  45000.0                   103     Terry Jones  11/01/13  70000.0                    34   Terry Gilliam  08/12/14  48000.0                    75   Michael Palin  05/23/13  66000.0                    8

Here are a few points worth noting:

  • First,pandas recognized that the first line of the CSV contained column names, and used them automatically. I call this Goodness.
  • However,pandas is also using zero-based integer indices in theDataFrame. That’s because we didn’t tell it what our index should be.
  • Further, if you look at the data types of our columns , you’ll seepandas has properly converted theSalary andSick Days remaining columns to numbers, but theHire Date column is still aString. This is easily confirmed in interactive mode:

    Python
    >>>print(type(df['Hire Date'][0]))<class 'str'>

Let’s tackle these issues one at a time. To use a different column as theDataFrame index, add theindex_col optional parameter:

Python
importpandasdf=pandas.read_csv('hrdata.csv',index_col='Name')print(df)

Now theName field is ourDataFrame index:

Shell
               Hire Date   Salary  Sick Days remainingNameGraham Chapman  03/15/14  50000.0                   10John Cleese     06/01/15  65000.0                    8Eric Idle       05/12/14  45000.0                   10Terry Jones     11/01/13  70000.0                    3Terry Gilliam   08/12/14  48000.0                    7Michael Palin   05/23/13  66000.0                    8

Next, let’s fix the data type of theHire Date field. You can forcepandas to read data as a date with theparse_dates optional parameter, which is defined as a list of column names to treat as dates:

Python
importpandasdf=pandas.read_csv('hrdata.csv',index_col='Name',parse_dates=['Hire Date'])print(df)

Notice the difference in the output:

Shell
                Hire Date   Salary  Sick Days remainingNameGraham Chapman 2014-03-15  50000.0                   10John Cleese    2015-06-01  65000.0                    8Eric Idle      2014-05-12  45000.0                   10Terry Jones    2013-11-01  70000.0                    3Terry Gilliam  2014-08-12  48000.0                    7Michael Palin  2013-05-23  66000.0                    8

The date is now formatted properly, which is easily confirmed in interactive mode:

Python
>>>print(type(df['Hire Date'][0]))<class 'pandas._libs.tslibs.timestamps.Timestamp'>

If your CSV files doesn’t have column names in the first line, you can use thenames optional parameter to provide a list of column names. You can also use this if you want to override the column names provided in the first line. In this case, you must also tellpandas.read_csv() to ignore existing column names using theheader=0 optional parameter:

Python
importpandasdf=pandas.read_csv('hrdata.csv',index_col='Employee',parse_dates=['Hired'],header=0,names=['Employee','Hired','Salary','Sick Days'])print(df)

Notice that, since the column names changed, the columns specified in theindex_col andparse_dates optional parameters must also be changed. This now results in the following output:

Shell
                    Hired   Salary  Sick DaysEmployeeGraham Chapman 2014-03-15  50000.0         10John Cleese    2015-06-01  65000.0          8Eric Idle      2014-05-12  45000.0         10Terry Jones    2013-11-01  70000.0          3Terry Gilliam  2014-08-12  48000.0          7Michael Palin  2013-05-23  66000.0          8

Writing CSV Files Withpandas

Of course, if you can’t get your data out ofpandas again, it doesn’t do you much good. Writing aDataFrame to a CSV file is just as easy as reading one in. Let’s write the data with the new column names to a new CSV file:

Python
importpandasdf=pandas.read_csv('hrdata.csv',index_col='Employee',parse_dates=['Hired'],header=0,names=['Employee','Hired','Salary','Sick Days'])df.to_csv('hrdata_modified.csv')

The only difference between this code and the reading code above is that theprint(df) call was replaced withdf.to_csv(), providing the file name. The new CSV file looks like this:

Shell
Employee,Hired,Salary,Sick DaysGraham Chapman,2014-03-15,50000.0,10John Cleese,2015-06-01,65000.0,8Eric Idle,2014-05-12,45000.0,10Terry Jones,2013-11-01,70000.0,3Terry Gilliam,2014-08-12,48000.0,7Michael Palin,2013-05-23,66000.0,8

Conclusion

If you understand the basics of reading CSV files, then you won’t ever be caught flat footed when you need to deal with importing data. Most CSV reading, processing, and writing tasks can be easily handled by the basiccsv Python library. If you have a lot of data to read and process, thepandas library provides quick and easy CSV handling capabilities as well.

Take the Quiz: Test your knowledge with our interactive “Reading and Writing CSV Files in Python” quiz. You’ll receive a score upon completion to help you track your learning progress:


Python CSV Parsing

Interactive Quiz

Reading and Writing CSV Files in Python

This quiz will check your understanding of what a CSV file is and the different ways to read and write to them in Python.

Are there other ways to parse text files? Of course! Libraries likeANTLR,PLY, andPlyPlus can all handle heavy-duty parsing, and if simpleString manipulation won’t work, there are alwaysregular expressions.

But those are topics for other articles…

Free Download:Get a sample chapter from Python Basics: A Practical Introduction to Python 3 to see how you can go from beginner to intermediate in Python with a complete curriculum, up-to-date for Python 3.8.

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Reading and Writing CSV Files

🐍 Python Tricks 💌

Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

AboutJon Fincher

Jon taught Python and Java in two high schools in Washington State. Previously, he was a Program Manager at Microsoft.

» More about Jon

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

MasterReal-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

MasterReal-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.


Looking for a real-time conversation? Visit theReal Python Community Chat or join the next“Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics:intermediatedata-science

Recommended Video Course:Reading and Writing CSV Files

Related Tutorials:

Keep reading Real Python by creating a free account or signing in:

Already have an account?Sign-In

Almost there! Complete this form and click the button below to gain instant access:

Python Basics: A Practical Introduction to Python 3

"Python Basics: A Practical Introduction to Python 3" – Free Sample Chapter (PDF)

🔒 No spam. We take your privacy seriously.


[8]ページ先頭

©2009-2025 Movatter.jp