How to Parse a Spreadsheet in Python: CSV Reader and DictReader

Written by Jeremy Grifski inCode Published March 4, 2019Last Updated May 26, 2020

How to Parse a Spreadsheet in Python Featured Image

Every once in awhile, I’ll have the need to load data from a spreadsheet into a Python program, but one question always comes up: what’s the best way to parse a spreadsheet in Python? The goal of today’s article is to find out!

Problem Introduction

Recently, I was learning a visualization library in Python called VTK, and I needed to find a way to visualize some data from a spreadsheet. Unfortunately, I had two problems:

I didn’t know how to read a spreadsheet
I didn’t know how to parse the data that I read

In other words, what are some good ways to read spreadsheet data? And, what are some good ways to model that data in a Python program?

Parsing CSV Files

That first question will be the target of this article. In particular, we’ll be looking at various ways to read data from a CSV file. For example, does it make sense to try to write our own parsing implementation? After all, a CSV is one of the easier file formats to parse (as seen below), and Python is great for working with strings:

Name,Age,Favorite ColorJeremy,25,BlueAlly,41,MagentaJasmine,29,Aqua

That said, we may prefer to use some of the utilities provided by Python like thecsv package. What’s the point of reinventing the wheel when there’s a battle tested library built right into the language?

As usual, we’ll tackle a handful of solutions and discuss their pros and cons. By the end of this article, you should feel comfortable parsing CSV files yourself.

Internal Representation

As for the second question, we have a couple of options. For starters, we could interpret the spreadsheet as a giant list of lists. In this example, we could give each row of data its own list and store these rows in a list:

# Heading: Name, Age, Favorite Colorcsv_matrix = [  ["Jeremy", 25, "Blue"],  ["Ally", 41, "Magenta"],  ["Jasmine", 29, "Aqua"]]

Alternatively, we could give each column of data their own list and store those lists in a list:

# Heading: Name, Age, Favorite Colorcsv_matrix = [  ["Jeremy", "Ally", "Jasmine"],  [25, 41, 29],  ["Blue", "Magenta", "Aqua"]]

In either case, we’d have a giant matrix of data that would closely resemble the original CSV file.

Of course, I’m a bit partial to dictionaries, so I might like to use one of those. For instance, what’s stopping us from creating a dictionary where each key provides us with an entire column of data?

csv_dict = {  "Name": ["Jeremy", "Ally", "Jasmine"],  "Age": [25, 41, 29],  "Favorite Color": ["Blue", "Magenta", "Aqua"]}

Alternatively, we could flip the relationship so we’re storing a list of dictionaries. That way, the data rows are mappings:

csv_mapping_list = [  {    "Name": "Jeremy",    "Age": 25,    "Favorite Color": "Blue"  },  {     "Name": "Ally",     "Age": 41,     "Favorite Color": "Magenta"  },  {    "Name": "Jasmine",    "Age": 29,    "Favorite Color": "Aqua"  }]

Personally, I prefer this last representation because data samples stick together. In other words, there’s no risk of distorting the original data set during actions like sorting. Meanwhile, the other three representations have independent lists which have to be sorted together. That’s enough to give this last representation the win.

In the next section, we’ll start digging into some parsing solutions which will all leverage this last representation. If you have any questions about the other three, feel free to drop them in the comments below.

Solutions

As always, we’ll cover a handful of common solutions to today’s problem. Feel free to grab what you need and run, but I recommend reading through the pros and cons of each solution. After all, you’ll never know when some of this might be useful.

Parse a Spreadsheet with Brute Force

At its core, a spreadsheet is a comma separated file (CSV). If we want to parse one by hand, we need to be aware of a few things:

The header row
The line endings
The delimiter (in this case a comma)

In our original example, we had a spreadsheet that looked something like the following:

Name,Age,Favorite ColorJeremy,25,BlueAlly,41,MagentaJasmine,29,Aqua

To parse a spreadsheet in Python by hand, we’d want to read each line and split it by comma. After that, we’d need to do some post processing to get the format we want:

csv_mapping_list = []with open("/path/to/data.csv") as my_data:  line_count = 0  for line in my_data:    row_list = [val.strip() for val in line.split(",")]    if line_count == 0:      header = row_list    else:      row_dict = {}      for i, key in enumerate(header):        row_dict[key] = row_list[i]      csv_mapping_list.append(row_dict)    line_count += 1

Here is my very rough attempt at parsing a spreadsheet by hand. In this example, we open the CSV and read it line by line. For each line, we split it by comma and systematically trim each value using a list comprehension.

From there, we decide if the line we just parsed is the header or not. If it is, we save its value for later. Otherwise, we iterate over the row of values and map them into a dictionary using the header. I suppose it may have been simpler to use one of the methods from ourHow to Convert Two Lists into a Dictionary in Python article:

csv_mapping_list = []with open("/path/to/data.csv") as my_data:  line_count = 0  for line in my_data:    row_list = [val.strip() for val in line.split(",")]    if line_count == 0:      header = row_list    else:      row_dict = {key: value for key, value in zip(header, row_list)}      csv_mapping_list.append(row_dict)    line_count += 1

In either case, this solution leaves a lot of room for modification. For instance, if your file has some other kind of delimiter, this may be the solution for you.

That said, be aware that the brute force solution has a major drawback. According to Juha-Matti Santala Opens in a new tab. , splitting by comma can fail if one of the rows contains text with a comma in it (i.e. "Grifski, Jeremy",25,Blue). To make matters worse, this problem changes depending on which delimiter you use. Fortunately, there are better solutions to follow!

Parse a Spreadsheet with the CSV Reader Object

As mentioned previously, we don’t have to write our own CSV parser if we don’t want to. Instead, we can get by with thecsv package. As you can probably imagine, it has a ton of CSV parsing functionality. In particular, it contains thereader object which we can use to read a CSV file like we did previously:

import csvcsv_mapping_list = []with open("/path/to/data.csv") as my_data:  csv_reader = csv.reader(my_data, delimiter=",")  line_count = 0  for line in csv_reader:     if line_count == 0:       header = line     else:       row_dict = {key: value for key, value in zip(header, line)}       csv_mapping_list.append(row_dict)     line_count += 1

With thereader object, we haven’t managed to simplify our code that much. In fact, all we did was replace thelist comprehension with the reader instantiation. Regardless, it’s a nice option for those who would prefer not to write their own parser.

Parse a Spreadsheet with the CSV DictReader Object

At this point, you’re probably wondering why we would even use thecsv library. After all, it barely made a difference in the last two examples. Fortunately, there is a way to reduce our code a bit using a special class in thecsv library called DictReader:

import csvwith open("/path/to/dict.csv") as my_data:  csv_mapping_list = list(csv.DictReader(my_data))

And, there we have it! All the parsing we did in the first two sections has now been drastically reduced to just 3 lines of code.

That said, there is a minor difference in this solution. Instead of creating a list of dictionaries, we’ve created a list ofOrderedDict Opens in a new tab. objects. They can be used just like dictionaries, but their key order is fixed. Regardless, the elegance of this solution should more than make up for that fact, and in some cases we may even want to preserve key order.

A Little Recap

At this point, I find it nice to look at all the potential solutions at once:

# Brute force solutioncsv_mapping_list = []with open("/path/to/data.csv") as my_data:  line_count = 0  for line in my_data:    row_list = [val.strip() for val in line.split(",")]    if line_count == 0:      header = row_list    else:      row_dict = {key: value for key, value in zip(header, row_list)}      csv_mapping_list.append(row_dict)    line_count += 1# CSV reader solutionimport csvcsv_mapping_list = []with open("/path/to/data.csv") as my_data:  csv_reader = csv.reader(my_data, delimiter=",")  line_count = 0  for line in csv_reader:     if line_count == 0:       header = line     else:       row_dict = {key: value for key, value in zip(header, line)}       csv_mapping_list.append(row_dict)     line_count += 1# CSV DictReader solutionimport csvwith open("/path/to/dict.csv") as my_data:  csv_mapping_list = list(csv.DictReader(my_data))

Up next, I’m looking to write an article onhow to sort a list of dictionaries, so look out for that! Until then, thanks for sticking around. Hopefully, this article was helpful. If so, why not give it a share? Even better,become a member of The Renegade Coder community Opens in a new tab. , so we can stay in touch!

If you’re not ready to leave just yet, I have plenty ofPython content for you:

Once again, thanks for sticking around!

How to Python (42 Articles)—Series Navigation

The How to Python tutorial series strays from the usual in-depth coding articles by exploring byte-sized problems in Python. In this series, students will dive into unique topics such asHow to Invert a Dictionary,How to Sum Elements of Two Lists, andHow to Check if a File Exists.

Each problem is explored from the naive approach to the ideal solution. Occasionally, there’ll be some just-for-fun solutions too. At the end of every article, you’ll find a recap full of code snippets for your own use. Don’t be afraid to take what you need!

If you’re not sure where to start, I recommend checking out our list ofPython Code Snippets for Everyday Problems. In addition, you can find some of the snippets in aJupyter notebook format on GitHub Opens in a new tab. ,

If you have a problem of your own, feel free to ask. Someone else probably has the same problem. Enjoy How to Python!

← Previous Post: [#10][#12]: Next Post →

Post

Save

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Most recently, he earned a PhD in Engineering Education and now works as a Lecturer. In his spare time, Jeremy enjoys spending time with his wife and kid, playing Overwatch 2, Lethal Company, and Baldur's Gate 3, reading manga, watching Penguins hockey, and traveling the world.

Movatterモバイル変換

How to Parse a Spreadsheet in Python: CSV Reader and DictReader

Table of Contents

Problem Introduction

Parsing CSV Files

Internal Representation

Solutions

Parse a Spreadsheet with Brute Force

Parse a Spreadsheet with the CSV Reader Object

Parse a Spreadsheet with the CSV DictReader Object

A Little Recap

Recent Posts

About Me

Legal

Longest Active Series