Recommended Video Course
Practical Recipes for Working With Files in Python
Table of Contents
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Practical Recipes for Working With Files in Python
Python has several built-in modules and functions for handling files. These functions are spread out over several modules such asos
,os.path
,shutil
, andpathlib
, to name a few. This article gathers in one place many of the functions you need to know in order to perform the most common operations on files in Python.
In this tutorial, you’ll learn how to:
fileinput
moduleFree Bonus:5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.
Reading and writing data to files using Python is pretty straightforward. To do this, you must first open files in the appropriate mode. Here’s an example of how to use Python’s “with open(…) as …” pattern to open a text file and read its contents:
withopen('data.txt','r')asf:data=f.read()
open()
takes a filename and a mode as its arguments.r
opens the file in read only mode. To write data to a file, pass inw
as an argument instead:
withopen('data.txt','w')asf:data='some data to be written to the file'f.write(data)
In the examples above,open()
opens files for reading or writing and returns a file handle (f
in this case) that provides methods that can be used to read or write data to the file. Check outReading and Writing Files in Python andWorking With File I/O in Python for more information on how to read and write to files.
Suppose your current working directory has a subdirectory calledmy_directory
that has the following contents:
my_directory/|├── sub_dir/| ├── bar.py| └── foo.py|├── sub_dir_b/| └── file4.txt|├── sub_dir_c/| ├── config.py| └── file5.txt|├── file1.py├── file2.csv└── file3.txt
The built-inos
module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, useos.listdir()
in legacy versions of Python oros.scandir()
in Python 3.os.scandir()
is the preferred method to use if you also want to get file and directory properties such as file size and modification date.
In versions of Python prior to Python 3,os.listdir()
is the method to use to get a directory listing:
>>>importos>>>entries=os.listdir('my_directory/')
os.listdir()
returns aPython list containing the names of the files and subdirectories in the directory given by the path argument:
>>>os.listdir('my_directory/')['sub_dir_c', 'file1.py', 'sub_dir_b', 'file3.txt', 'file2.csv', 'sub_dir']
A directory listing like that isn’t easy to read. Printing out the output of a call toos.listdir()
using a loop helps clean things up:
>>>entries=os.listdir('my_directory/')>>>forentryinentries:...print(entry)......sub_dir_cfile1.pysub_dir_bfile3.txtfile2.csvsub_dir
In modern versions of Python, an alternative toos.listdir()
is to useos.scandir()
andpathlib.Path()
.
os.scandir()
was introduced in Python 3.5 and is documented inPEP 471.os.scandir()
returns an iterator as opposed to a list when called:
>>>importos>>>entries=os.scandir('my_directory/')>>>entries<posix.ScandirIterator object at 0x7f5b047f3690>
TheScandirIterator
points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:
importoswithos.scandir('my_directory/')asentries:forentryinentries:print(entry.name)
Here,os.scandir()
is used in conjunction with thewith
statement because it supports the context manager protocol. Using a context manager closes the iterator and frees up acquired resources automatically after the iterator has been exhausted. The result is a print out of the filenames inmy_directory/
just like you saw in theos.listdir()
example:
sub_dir_cfile1.pysub_dir_bfile3.txtfile2.csvsub_dir
Another way to get a directory listing is to use thepathlib
module:
frompathlibimportPathentries=Path('my_directory/')forentryinentries.iterdir():print(entry.name)
The objects returned byPath
are eitherPosixPath
orWindowsPath
objects depending on the OS.
pathlib.Path()
objects have an.iterdir()
method for creating aniterator of all files and folders in a directory. Each entry yielded by.iterdir()
contains information about the file or directory such as its name and file attributes.pathlib
was first introduced in Python 3.4 and is a great addition to Python that provides an object oriented interface to the filesystem.
In the example above, you callpathlib.Path()
and pass a path argument to it. Next is the call to.iterdir()
to get a list of all files and directories inmy_directory
.
pathlib
offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way. Usingpathlib
is more if not equally efficient as using the functions inos
. Another benefit of usingpathlib
overos
is that it reduces the number of imports you need to make to manipulate filesystem paths. For more information, readPython’s pathlib Module: Taming the File System.
Note: To get started withpathlib
, check outPython Basics: File System Operations and the associatedexercises.
Running the code above produces the following:
sub_dir_cfile1.pysub_dir_bfile3.txtfile2.csvsub_dir
Usingpathlib.Path()
oros.scandir()
instead ofos.listdir()
is the preferred way of getting a directory listing, especially when you’re working with code that needs the file type and file attribute information.pathlib.Path()
offers much of the file and path handling functionality found inos
andshutil
, and it’s methods are more efficient than some found in these modules. We will discuss how to get file properties shortly.
Here are the directory-listing functions again:
Function | Description |
---|---|
os.listdir() | Returns a list of all files and folders in a directory |
os.scandir() | Returns an iterator of all the objects in a directory including file attribute information |
pathlib.Path.iterdir() | Returns an iterator of all the objects in a directory including file attribute information |
These functions return a list ofeverything in the directory, including subdirectories. This might not always be the behavior you want. The next section will show you how to filter the results from a directory listing.
This section will show you how to print out the names of files in a directory usingos.listdir()
,os.scandir()
, andpathlib.Path()
. To filter out directories and only list files from a directory listing produced byos.listdir()
, useos.path
:
importos# List all files in a directory using os.listdirbasepath='my_directory/'forentryinos.listdir(basepath):ifos.path.isfile(os.path.join(basepath,entry)):print(entry)
Here, the call toos.listdir()
returns a list of everything in the specified path, and then that list is filtered byos.path.isfile()
to only print out files and not directories. This produces the following output:
file1.pyfile3.txtfile2.csv
An easier way to list files in a directory is to useos.scandir()
orpathlib.Path()
:
importos# List all files in a directory using scandir()basepath='my_directory/'withos.scandir(basepath)asentries:forentryinentries:ifentry.is_file():print(entry.name)
Usingos.scandir()
has the advantage of looking cleaner and being easier to understand than usingos.listdir()
, even though it is one line of code longer. Callingentry.is_file()
on each item in theScandirIterator
returnsTrue
if the object is a file. Printing out the names of all files in the directory gives you the following output:
file1.pyfile3.txtfile2.csv
Here’s how to list files in a directory usingpathlib.Path()
:
frompathlibimportPathbasepath=Path('my_directory/')files_in_basepath=basepath.iterdir()foriteminfiles_in_basepath:ifitem.is_file():print(item.name)
Here, you call.is_file()
on each entry yielded by.iterdir()
. The output produced is the same:
file1.pyfile3.txtfile2.csv
The code above can be made more concise if you combine thefor
loop and theif
statement into a single generator expression. Dan Bader has anexcellent article ongenerator expressions and list comprehensions.
The modified version looks like this:
frompathlibimportPath# List all files in directory using pathlibbasepath=Path('my_directory/')files_in_basepath=(entryforentryinbasepath.iterdir()ifentry.is_file())foriteminfiles_in_basepath:print(item.name)
This produces exactly the same output as the example before it. This section showed that filtering files or directories usingos.scandir()
andpathlib.Path()
feels more intuitive and looks cleaner than usingos.listdir()
in conjunction withos.path
.
To list subdirectories instead of files, use one of the methods below. Here’s how to useos.listdir()
andos.path()
:
importos# List all subdirectories using os.listdirbasepath='my_directory/'forentryinos.listdir(basepath):ifos.path.isdir(os.path.join(basepath,entry)):print(entry)
Manipulating filesystem paths this way can quickly become cumbersome when you have multiple calls toos.path.join()
. Running this on my computer produces the following output:
sub_dir_csub_dir_bsub_dir
Here’s how to useos.scandir()
:
importos# List all subdirectories using scandir()basepath='my_directory/'withos.scandir(basepath)asentries:forentryinentries:ifentry.is_dir():print(entry.name)
As in the file listing example, here you call.is_dir()
on each entry returned byos.scandir()
. If the entry is a directory,.is_dir()
returnsTrue
, and the directory’s name is printed out. The output is the same as above:
sub_dir_csub_dir_bsub_dir
Here’s how to usepathlib.Path()
:
frompathlibimportPath# List all subdirectory using pathlibbasepath=Path('my_directory/')forentryinbasepath.iterdir():ifentry.is_dir():print(entry.name)
Calling.is_dir()
on each entry of thebasepath
iterator checks if an entry is a file or a directory. If the entry is a directory, its name is printed out to the screen, and the output produced is the same as the one from the previous example:
sub_dir_csub_dir_bsub_dir
Python makes retrieving file attributes such as file size and modified times easy. This is done throughos.stat()
,os.scandir()
, orpathlib.Path()
.
os.scandir()
andpathlib.Path()
retrieve a directory listing with file attributes combined. This can be potentially more efficient than usingos.listdir()
to list files and then getting file attribute information for each file.
The examples below show how to get the time the files inmy_directory/
were last modified. The output is in seconds:
>>>importos>>>withos.scandir('my_directory/')asdir_contents:...forentryindir_contents:...info=entry.stat()...print(info.st_mtime)...1539032199.00520351539032469.63244751538998552.24029231540233322.40093161537192240.04973391540266380.3434134
os.scandir()
returns aScandirIterator
object. Each entry in aScandirIterator
object has a.stat()
method that retrieves information about the file or directory it points to..stat()
provides information such as file size and the time of last modification. In the example above, the code prints out thest_mtime
attribute, which is the time the content of the file was last modified.
Thepathlib
module has corresponding methods for retrieving file information that give the same results:
>>>frompathlibimportPath>>>current_dir=Path('my_directory')>>>forpathincurrent_dir.iterdir():...info=path.stat()...print(info.st_mtime)...1539032199.00520351539032469.63244751538998552.24029231540233322.40093161537192240.04973391540266380.3434134
In the example above, the code loops through the object returned by.iterdir()
and retrieves file attributes through a.stat()
call for each file in the directory list. Thest_mtime
attribute returns a float value that representsseconds since the epoch. To convert the values returned byst_mtime
for display purposes, you could write a helper function to convert the seconds into adatetime
object:
fromdatetimeimportdatetimefromosimportscandirdefconvert_date(timestamp):d=datetime.utcfromtimestamp(timestamp)formated_date=d.strftime('%d %b %Y')returnformated_datedefget_files():dir_entries=scandir('my_directory/')forentryindir_entries:ifentry.is_file():info=entry.stat()print(f'{entry.name}\t Last Modified:{convert_date(info.st_mtime)}')
This will first get a list of files inmy_directory
and their attributes and then callconvert_date()
to convert each file’s last modified time into a human readable form.convert_date()
makes use of.strftime()
to convert the time in seconds into a string.
The arguments passed to.strftime()
are the following:
%d
: the day of the month%b
: the month, in abbreviated form%Y
: the yearTogether, these directives produce output that looks like this:
>>>get_files()file1.py Last modified: 04 Oct 2018file3.txt Last modified: 17 Sep 2018file2.txt Last modified: 17 Sep 2018
The syntax for converting dates and times into strings can be quite confusing. To read more about it, check out theofficial documentation on it. Another handy reference that is easy to remember ishttp://strftime.org/ .
Sooner or later, the programs you write will have to create directories in order to store data in them.os
andpathlib
include functions for creating directories. We’ll consider these:
Function | Description |
---|---|
os.mkdir() | Creates a single subdirectory |
pathlib.Path.mkdir() | Creates single or multiple directories |
os.makedirs() | Creates multiple directories, including intermediate directories |
To create a single directory, pass a path to the directory as a parameter toos.mkdir()
:
importosos.mkdir('example_directory/')
If a directory already exists,os.mkdir()
raisesFileExistsError
. Alternatively, you can create a directory usingpathlib
:
frompathlibimportPathp=Path('example_directory/')p.mkdir()
If the path already exists,mkdir()
raises aFileExistsError
:
>>>p.mkdir()Traceback (most recent call last): File '<stdin>', line 1, in <module> File '/usr/lib/python3.5/pathlib.py', line 1214, in mkdir self._accessor.mkdir(self, mode) File '/usr/lib/python3.5/pathlib.py', line 371, in wrapped return strfunc(str(pathobj), *args)FileExistsError:[Errno 17] File exists: '.'[Errno 17] File exists: '.'
To avoid errors like this,catch the error when it happens and let your user know:
frompathlibimportPathp=Path('example_directory')try:p.mkdir()exceptFileExistsErrorasexc:print(exc)
Alternatively, you can ignore theFileExistsError
by passing theexist_ok=True
argument to.mkdir()
:
frompathlibimportPathp=Path('example_directory')p.mkdir(exist_ok=True)
This will not raise an error if the directory already exists.
os.makedirs()
is similar toos.mkdir()
. The difference between the two is that not only canos.makedirs()
create individual directories, it can also be used to create directory trees. In other words, it can create any necessary intermediate folders in order to ensure a full path exists.
os.makedirs()
is similar to runningmkdir -p
in Bash. For example, to create a group of directories like2018/10/05
, all you have to do is the following:
importosos.makedirs('2018/10/05')
This will create a nested directory structure that contains the folders 2018, 10, and 05:
.|└── 2018/ └── 10/ └── 05/
.makedirs()
creates directories with default permissions. If you need to create directories with different permissions call.makedirs()
and pass in the mode you would like the directories to be created in:
importosos.makedirs('2018/10/05',mode=0o770)
This creates the2018/10/05
directory structure and gives the owner and group users read, write, and execute permissions. The default mode is0o777
, and the file permission bits of existing parent directories are not changed. For more details on file permissions, and how the mode is applied,see the docs.
Runtree
to confirm that the right permissions were applied:
$tree-p-i..[drwxrwx---] 2018[drwxrwx---] 10[drwxrwx---] 05
This prints out a directory tree of the current directory.tree
is normally used to list contents of directories in a tree-like format. Passing the-p
and-i
arguments to it prints out the directory names and their file permission information in a vertical list.-p
prints out the file permissions, and-i
makestree
produce a vertical list without indentation lines.
As you can see, all of the directories have770
permissions. An alternative way to create directories is to use.mkdir()
frompathlib.Path
:
importpathlibp=pathlib.Path('2018/10/05')p.mkdir(parents=True)
Passingparents=True
toPath.mkdir()
makes it create the directory05
and any parent directories necessary to make the path valid.
By default,os.makedirs()
andPath.mkdir()
raise anOSError
if the target directory already exists. This behavior can be overridden (as of Python 3.2) by passingexist_ok=True
as a keyword argument when calling each function.
Running the code above produces a directory structure like the one below in one go:
.|└── 2018/ └── 10/ └── 05/
I prefer usingpathlib
when creating directories because I can use the same function to create single or nested directories.
After getting a list of files in a directory using one of the methods above, you will most probably want to search for files that match a particular pattern.
These are the methods and functions available to you:
endswith()
andstartswith()
string methodsfnmatch.fnmatch()
glob.glob()
pathlib.Path.glob()
Each of these is discussed below. The examples in this section will be performed on a directory calledsome_directory
that has the following structure:
.|├── sub_dir/| ├── file1.py| └── file2.py|├── admin.py├── data_01_backup.txt├── data_01.txt├── data_02_backup.txt├── data_02.txt├── data_03_backup.txt├── data_03.txt└── tests.py
If you’re following along using a Bash shell, you can create the above directory structure using the following commands:
$mkdirsome_directory$cdsome_directory/$mkdirsub_dir$touchsub_dir/file1.pysub_dir/file2.py$touchdata_{01..03}.txtdata_{01..03}_backup.txtadmin.pytests.py
This will create thesome_directory/
directory, change into it, and then createsub_dir
. The next line createsfile1.py
andfile2.py
insub_dir
, and the last line creates all the other files using expansion. To learn more about shell expansion, visitthis site.
Python has several built-in methods formodifying and manipulating strings. Two of these methods,.startswith()
and.endswith()
, are useful when you’re searching for patterns in filenames. To do this, first get a directory listing and then iterate over it:
>>>importos>>># Get .txt files>>>forf_nameinos.listdir('some_directory'):...iff_name.endswith('.txt'):...print(f_name)
The code above finds all the files insome_directory/
, iterates over them and uses.endswith()
to print out the filenames that have the.txt
file extension. Running this on my computer produces the following output:
data_01.txtdata_03.txtdata_03_backup.txtdata_02_backup.txtdata_02.txtdata_01_backup.txt
fnmatch
String methods are limited in their matching abilities.fnmatch
has more advanced functions and methods for pattern matching. We will considerfnmatch.fnmatch()
, a function that supports the use of wildcards such as*
and?
to match filenames. For example, in order to find all.txt
files in a directory usingfnmatch
, you would do the following:
>>>importos>>>importfnmatch>>>forfile_nameinos.listdir('some_directory/'):...iffnmatch.fnmatch(file_name,'*.txt'):...print(file_name)
This iterates over the list of files insome_directory
and uses.fnmatch()
to perform a wildcard search for files that have the.txt
extension.
Let’s suppose you want to find.txt
files that meet certain criteria. For example, you could be only interested in finding.txt
files that contain the worddata
, a number between a set of underscores, and the wordbackup
in their filename. Something similar todata_01_backup
,data_02_backup
, ordata_03_backup
.
Usingfnmatch.fnmatch()
, you could do it this way:
>>>forfilenameinos.listdir('.'):...iffnmatch.fnmatch(filename,'data_*_backup.txt'):...print(filename)
Here, you print only the names of files that match thedata_*_backup.txt
pattern. The asterisk in the pattern will match any character, so running this will find all text files whose filenames start with the worddata
and end inbackup.txt
, as you can see from the output below:
data_03_backup.txtdata_02_backup.txtdata_01_backup.txt
glob
Another useful module for pattern matching isglob
.
.glob()
in theglob
module works just likefnmatch.fnmatch()
, but unlikefnmatch.fnmatch()
, it treats files beginning with a period (.
) as special.
UNIX and related systems translate name patterns with wildcards like?
and*
into a list of files. This is called globbing.
For example, typingmv *.py python_files/
in a UNIX shell moves (mv
) all files with the.py
extension from the current directory to the directorypython_files
. The*
character is a wildcard that means “any number of characters,” and*.py
is the glob pattern. This shell capability is not available in the Windows Operating System. Theglob
module adds this capability in Python, which enables Windows programs to use this feature.
Here’s an example of how to useglob
to search for all Python (.py
) source files in the current directory:
>>>importglob>>>glob.glob('*.py')['admin.py', 'tests.py']
glob.glob('*.py')
searches for all files that have the.py
extension in the current directory and returns them as a list.glob
also supports shell-style wildcards to match patterns:
>>>importglob>>>fornameinglob.glob('*[0-9]*.txt'):...print(name)
This finds all text (.txt
) files that contain digits in the filename:
data_01.txtdata_03.txtdata_03_backup.txtdata_02_backup.txtdata_02.txtdata_01_backup.txt
glob
makes it easy to search for filesrecursively in subdirectories too:
>>>importglob>>>forfileinglob.iglob('**/*.py',recursive=True):...print(file)
This example makes use ofglob.iglob()
to search for.py
files in the current directory and subdirectories. Passingrecursive=True
as an argument to.iglob()
makes it search for.py
files in the current directory and any subdirectories. The difference betweenglob.iglob()
andglob.glob()
is that.iglob()
returns an iterator instead of a list.
Running the program above produces the following:
admin.pytests.pysub_dir/file1.pysub_dir/file2.py
pathlib
contains similar methods for making flexible file listings. The example below shows how you can use.Path.glob()
to list file types that start with the letterp
:
>>>frompathlibimportPath>>>p=Path('.')>>>fornameinp.glob('*.p*'):...print(name)admin.pyscraper.pydocs.pdf
Callingp.glob('*.p*')
returns a generator object that points to all files in the current directory that start with the letterp
in their file extension.
Path.glob()
is similar toos.glob()
discussed above. As you can see,pathlib
combines many of the best features of theos
,os.path
, andglob
modules into one single module, which makes it a joy to use.
To recap, here is a table of the functions we have covered in this section:
Function | Description |
---|---|
startswith() | Tests if a string starts with a specified pattern and returnsTrue orFalse |
endswith() | Tests if a string ends with a specified pattern and returnsTrue orFalse |
fnmatch.fnmatch(filename, pattern) | Tests whether the filename matches the pattern and returnsTrue orFalse |
glob.glob() | Returns a list of filenames that match a pattern |
pathlib.Path.glob() | Finds patterns in path names and returns a generator object |
A common programming task is walking a directory tree and processing files in the tree. Let’s explore how the built-in Python functionos.walk()
can be used to do this.os.walk()
is used to generate filename in a directory tree by walking the tree either top-down or bottom-up. For the purposes of this section, we’ll be manipulating the following directory tree:
.|├── folder_1/| ├── file1.py| ├── file2.py| └── file3.py|├── folder_2/| ├── file4.py| ├── file5.py| └── file6.py|├── test1.txt└── test2.txt
The following is an example that shows you how to list all files and directories in a directory tree usingos.walk()
.
os.walk()
defaults to traversing directories in a top-down manner:
# Walking a directory tree and printing the names of the directories and filesfordirpath,dirnames,filesinos.walk('.'):print(f'Found directory:{dirpath}')forfile_nameinfiles:print(file_name)
os.walk()
returns three values on each iteration of the loop:
The name of the current folder
A list of folders in the current folder
A list of files in the current folder
On each iteration, it prints out the names of the subdirectories and files it finds:
Found directory: .test1.txttest2.txtFound directory: ./folder_1file1.pyfile3.pyfile2.pyFound directory: ./folder_2file4.pyfile5.pyfile6.py
To traverse the directory tree in a bottom-up manner, pass in atopdown=False
keyword argument toos.walk()
:
fordirpath,dirnames,filesinos.walk('.',topdown=False):print(f'Found directory:{dirpath}')forfile_nameinfiles:print(file_name)
Passing thetopdown=False
argument will makeos.walk()
print out the files it finds in thesubdirectories first:
Found directory: ./folder_1file1.pyfile3.pyfile2.pyFound directory: ./folder_2file4.pyfile5.pyfile6.pyFound directory: .test1.txttest2.txt
As you can see, the program started by listing the contents of the subdirectories before listing the contents of the root directory. This is very useful in situations where you want to recursively delete files and directories. You will learn how to do this in the sections below. By default,os.walk
does not walk down into symbolic links that resolve to directories. This behavior can be overridden by calling it with afollowlinks=True
argument.
Python provides a handy module for creating temporary files and directories calledtempfile
.
tempfile
can be used to open and store data temporarily in a file or directory while your program is running.tempfile
handles the deletion of the temporary files when your program is done with them.
Here’s how to create a temporary file:
fromtempfileimportTemporaryFile# Create a temporary file and write some data to itfp=TemporaryFile('w+t')fp.write('Hello universe!')# Go back to the beginning and read data from filefp.seek(0)data=fp.read()# Close the file, after which it will be removedfp.close()
The first step is to importTemporaryFile
from thetempfile
module. Next, create a file like object using theTemporaryFile()
method by calling it and passing the mode you want to open the file in. This will create and open a file that can be used as a temporary storage area.
In the example above, the mode is'w+t'
, which makestempfile
create a temporary text file in write mode. There is no need to give the temporary file a filename since it will be destroyed after the script is done running.
After writing to the file, you can read from it and close it when you’re done processing it. Once the file is closed, it will be deleted from the filesystem. If you need to name the temporary files produced usingtempfile
, usetempfile.NamedTemporaryFile()
.
The temporary files and directories created usingtempfile
are stored in a special system directory for storing temporary files. Python searches a standard list of directories to find one that the user can create files in.
On Windows, the directories areC:\TEMP
,C:\TMP
,\TEMP
, and\TMP
, in that order. On all other platforms, the directories are/tmp
,/var/tmp
, and/usr/tmp
, in that order. As a last resort,tempfile
will save temporary files and directories in the current directory.
.TemporaryFile()
is also a context manager so it can be used in conjunction with thewith
statement. Using a context manager takes care of closing and deleting the file automatically after it has been read:
withTemporaryFile('w+t')asfp:fp.write('Hello universe!')fp.seek(0)fp.read()# File is now closed and removed
This creates a temporary file and reads data from it. As soon as the file’s contents are read, the temporary file is closed and deleted from the file system.
tempfile
can also be used to create temporary directories. Let’s look at how you can do this usingtempfile.TemporaryDirectory()
:
>>>importtempfile>>>withtempfile.TemporaryDirectory()astmpdir:...print('Created temporary directory ',tmpdir)...os.path.exists(tmpdir)...Created temporary directory /tmp/tmpoxbkrm6cTrue>>># Directory contents have been removed...>>>tmpdir'/tmp/tmpoxbkrm6c'>>>os.path.exists(tmpdir)False
Callingtempfile.TemporaryDirectory()
creates a temporary directory in the file system and returns an object representing this directory. In the example above, the directory is created using a context manager, and the name of the directory is stored intmpdir
. The third line prints out the name of the temporary directory, andos.path.exists(tmpdir)
confirms if the directory was actually created in the file system.
After the context manager goes out of context, the temporary directory is deleted and a call toos.path.exists(tmpdir)
returnsFalse
, which means that the directory was succesfully deleted.
You can delete single files, directories, and entire directory trees using the methods found in theos
,shutil
, andpathlib
modules. The following sections describe how to delete files and directories that you no longer need.
To delete a single file, usepathlib.Path.unlink()
,os.remove()
. oros.unlink()
.
os.remove()
andos.unlink()
are semantically identical. To delete a file usingos.remove()
, do the following:
importosdata_file='C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'os.remove(data_file)
Deleting a file usingos.unlink()
is similar to how you do it usingos.remove()
:
importosdata_file='C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'os.unlink(data_file)
Calling.unlink()
or.remove()
on a file deletes the file from the filesystem. These two functions will throw anOSError
if the path passed to them points to a directory instead of a file. To avoid this, you can either check that what you’re trying to delete is actually a file and only delete it if it is, or you can use exception handling to handle theOSError
:
importosdata_file='home/data.txt'# If the file exists, delete itifos.path.isfile(data_file):os.remove(data_file)else:print(f'Error:{data_file} not a valid filename')
os.path.isfile()
checks whetherdata_file
is actually a file. If it is, it is deleted by the call toos.remove()
. Ifdata_file
points to a folder, an error message is printed to the console.
The following example shows how to use exception handling to handle errors when deleting files:
importosdata_file='home/data.txt'# Use exception handlingtry:os.remove(data_file)exceptOSErrorase:print(f'Error:{data_file} :{e.strerror}')
The code above attempts to delete the file first before checking its type. Ifdata_file
isn’t actually a file, theOSError
that is thrown is handled in theexcept
clause, and an error message is printed to the console. The error message that gets printed out is formatted usingPython f-strings.
Finally, you can also usepathlib.Path.unlink()
to delete files:
frompathlibimportPathdata_file=Path('home/data.txt')try:data_file.unlink()exceptIsADirectoryErrorase:print(f'Error:{data_file} :{e.strerror}')
This creates aPath
object calleddata_file
that points to a file. Calling.remove()
ondata_file
will deletehome/data.txt
. Ifdata_file
points to a directory, anIsADirectoryError
is raised. It is worth noting that the Python program above has the same permissions as the user running it. If the user does not have permission to delete the file, aPermissionError
is raised.
The standard library offers the following functions for deleting directories:
os.rmdir()
pathlib.Path.rmdir()
shutil.rmtree()
To delete a single directory or folder, useos.rmdir()
orpathlib.rmdir()
. These two functions only work if the directory you’re trying to delete is empty. If the directory isn’t empty, anOSError
is raised. Here is how to delete a folder:
importostrash_dir='my_documents/bad_dir'try:os.rmdir(trash_dir)exceptOSErrorase:print(f'Error:{trash_dir} :{e.strerror}')
Here, thetrash_dir
directory is deleted by passing its path toos.rmdir()
. If the directory isn’t empty, an error message is printed to the screen:
Traceback (most recent call last): File '<stdin>', line 1, in <module>OSError:[Errno 39] Directory not empty: 'my_documents/bad_dir'
Alternatively, you can usepathlib
to delete directories:
frompathlibimportPathtrash_dir=Path('my_documents/bad_dir')try:trash_dir.rmdir()exceptOSErrorase:print(f'Error:{trash_dir} :{e.strerror}')
Here, you create aPath
object that points to the directory to be deleted. Calling.rmdir()
on thePath
object will delete it if it is empty.
To delete non-empty directories and entire directory trees, Python offersshutil.rmtree()
:
importshutiltrash_dir='my_documents/bad_dir'try:shutil.rmtree(trash_dir)exceptOSErrorase:print(f'Error:{trash_dir} :{e.strerror}')
Everything intrash_dir
is deleted whenshutil.rmtree()
is called on it. There may be cases where you want to delete empty folders recursively. You can do this using one of the methods discussed above in conjunction withos.walk()
:
importosfordirpath,dirnames,filesinos.walk('.',topdown=False):try:os.rmdir(dirpath)exceptOSErrorasex:pass
This walks down the directory tree and tries to delete each directory it finds. If the directory isn’t empty, anOSError
is raised and that directory is skipped. The table below lists the functions covered in this section:
Function | Description |
---|---|
os.remove() | Deletes a file and does not delete directories |
os.unlink() | Is identical toos.remove() and deletes a single file |
pathlib.Path.unlink() | Deletes a file and cannot delete directories |
os.rmdir() | Deletes an empty directory |
pathlib.Path.rmdir() | Deletes an empty directory |
shutil.rmtree() | Deletes entire directory tree and can be used to delete non-empty directories |
Python ships with theshutil
module.shutil
is short for shell utilities. It provides a number of high-level operations on files to support copying, archiving, and removal of files and directories. In this section, you’ll learn how to move and copy files and directories.
shutil
offers a couple of functions for copying files. The most commonly used functions areshutil.copy()
andshutil.copy2()
. To copy a file from one location to another usingshutil.copy()
, do the following:
importshutilsrc='path/to/file.txt'dst='path/to/dest_dir'shutil.copy(src,dst)
shutil.copy()
is comparable to thecp
command in UNIX based systems.shutil.copy(src, dst)
will copy the filesrc
to the location specified indst
. Ifdst
is a file, the contents of that file are replaced with the contents ofsrc
. Ifdst
is a directory, thensrc
will be copied into that directory.shutil.copy()
only copies the file’s contents and the file’s permissions. Other metadata like the file’s creation and modification times are not preserved.
To preserve all file metadata when copying, useshutil.copy2()
:
importshutilsrc='path/to/file.txt'dst='path/to/dest_dir'shutil.copy2(src,dst)
Using.copy2()
preserves details about the file such as last access time, permission bits, last modification time, and flags.
Whileshutil.copy()
only copies a single file,shutil.copytree()
will copy an entire directory and everything contained in it.shutil.copytree(src, dest)
takes two arguments: a source directory and the destination directory where files and folders will be copied to.
Here’s an example of how to copy the contents of one folder to a different location:
>>>importshutil>>>shutil.copytree('data_1','data1_backup')'data1_backup'
In this example,.copytree()
copies the contents ofdata_1
to a new locationdata1_backup
and returns the destination directory. The destination directory must not already exist. It will be created as well as missing parent directories.shutil.copytree()
is a good way to back up your files.
To move a file or directory to another location, useshutil.move(src, dst)
.
src
is the file or directory to be moved anddst
is the destination:
>>>importshutil>>>shutil.move('dir_1/','backup/')'backup'
shutil.move('dir_1/', 'backup/')
movesdir_1/
intobackup/
ifbackup/
exists. Ifbackup/
does not exist,dir_1/
will be renamed tobackup
.
Python includesos.rename(src, dst)
for renaming files and directories:
>>>os.rename('first.zip','first_01.zip')
The line above will renamefirst.zip
tofirst_01.zip
. If the destination path points to a directory, it will raise anOSError
.
Another way to rename files or directories is to userename()
from thepathlib
module:
>>>frompathlibimportPath>>>data_file=Path('data_01.txt')>>>data_file.rename('data.txt')
To rename files usingpathlib
, you first create apathlib.Path()
object that contains a path to the file you want to replace. The next step is to callrename()
on the path object and pass a new filename for the file or directory you’re renaming.
Archives are a convenient way to package several files into one. The two most common archive types are ZIP and TAR. The Python programs you write can create, read, and extract data from archives. You will learn how to read and write to both archive formats in this section.
Thezipfile
module is a low level module that is part of the Python Standard Library.zipfile
has functions that make it easy to open and extract ZIP files. To read the contents of a ZIP file, the first thing to do is to create aZipFile
object.ZipFile
objects are similar to file objects created usingopen()
.ZipFile
is also a context manager and therefore supports thewith
statement:
importzipfilewithzipfile.ZipFile('data.zip','r')aszipobj:
Here, you create aZipFile
object, passing in the name of the ZIP file to open in read mode. After opening a ZIP file, information about the archive can be accessed through functions provided by thezipfile
module. Thedata.zip
archive in the example above was created from a directory nameddata
that contains a total of 5 files and 1 subdirectory:
.|├── sub_dir/| ├── bar.py| └── foo.py|├── file1.py├── file2.py└── file3.py
To get a list of files in the archive, callnamelist()
on theZipFile
object:
importzipfilewithzipfile.ZipFile('data.zip','r')aszipobj:zipobj.namelist()
This produces a list:
['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
.namelist()
returns a list of names of the files and directories in the archive. To retrieve information about the files in the archive, use.getinfo()
:
importzipfilewithzipfile.ZipFile('data.zip','r')aszipobj:bar_info=zipobj.getinfo('sub_dir/bar.py')bar_info.file_size
Here’s the output:
15277
.getinfo()
returns aZipInfo
object that stores information about a single member of the archive. To get information about a file in the archive, you pass its path as an argument to.getinfo()
. Usinggetinfo()
, you’re able to retrieve information about archive members such as the date the files were last modified, their compressed sizes, and their full filenames. Accessing.file_size
retrieves the file’s original size in bytes.
The following example shows how to retrieve more details about archived files in aPython REPL. Assume that thezipfile
module has been imported andbar_info
is the same object you created in previous examples:
>>>bar_info.date_time(2018, 10, 7, 23, 30, 10)>>>bar_info.compress_size2856>>>bar_info.filename'sub_dir/bar.py'
bar_info
contains details aboutbar.py
such as its size when compressed and its full path.
The first line shows how to retrieve a file’s last modified date. The next line shows how to get the size of the file after compression. The last line shows the full path ofbar.py
in the archive.
ZipFile
supports the context manager protocol, which is why you’re able to use it with thewith
statement. Doing this automatically closes theZipFile
object after you’re done with it. Trying to open or extract files from a closedZipFile
object will result in an error.
Thezipfile
module allows you to extract one or more files from ZIP archives through.extract()
and.extractall()
.
These methods extract files to the current directory by default. They both take an optionalpath
parameter that allows you to specify a different directory to extract files to. If the directory does not exist, it is automatically created. To extract files from the archive, do the following:
>>>importzipfile>>>importos>>>os.listdir('.')['data.zip']>>>data_zip=zipfile.ZipFile('data.zip','r')>>># Extract a single file to current directory>>>data_zip.extract('file1.py')'/home/terra/test/dir1/zip_extract/file1.py'>>>os.listdir('.')['file1.py', 'data.zip']>>># Extract all files into a different directory>>>data_zip.extractall(path='extract_dir/')>>>os.listdir('.')['file1.py', 'extract_dir', 'data.zip']>>>os.listdir('extract_dir')['file1.py', 'file3.py', 'file2.py', 'sub_dir']>>>data_zip.close()
The third line of code is a call toos.listdir()
, which shows that the current directory has only one file,data.zip
.
Next, you opendata.zip
in read mode and call.extract()
to extractfile1.py
from it..extract()
returns the full file path of the extracted file. Since there’s no path specified,.extract()
extractsfile1.py
to the current directory.
The next line prints a directory listing showing that the current directory now includes the extracted file in addition to the original archive. The line after that shows how to extract the entire archive into thezip_extract
directory..extractall()
creates theextract_dir
and extracts the contents ofdata.zip
into it. The last line closes the ZIP archive.
zipfile
supports extracting password protected ZIPs. To extract password protected ZIP files, pass in the password to the.extract()
or.extractall()
method as an argument:
>>>importzipfile>>>withzipfile.ZipFile('secret.zip','r')aspwd_zip:...# Extract from a password protected archive...pwd_zip.extractall(path='extract_dir',pwd='Quish3@o')
This opens thesecret.zip
archive in read mode. A password is supplied to.extractall()
, and the archive contents are extracted toextract_dir
. The archive is closed automatically after the extraction is complete thanks to thewith
statement.
To create a new ZIP archive, you open aZipFile
object in write mode (w
) and add the files you want to archive:
>>>importzipfile>>>file_list=['file1.py','sub_dir/','sub_dir/bar.py','sub_dir/foo.py']>>>withzipfile.ZipFile('new.zip','w')asnew_zip:...fornameinfile_list:...new_zip.write(name)
In the example,new_zip
is opened in write mode and each file infile_list
is added to the archive. When thewith
statement suite is finished,new_zip
is closed. Opening a ZIP file in write mode erases the contents of the archive and creates a new archive.
To add files to an existing archive, open aZipFile
object in append mode and then add the files:
>>># Open a ZipFile object in append mode>>>withzipfile.ZipFile('new.zip','a')asnew_zip:...new_zip.write('data.txt')...new_zip.write('latin.txt')
Here, you open thenew.zip
archive you created in the previous example in append mode. Opening theZipFile
object in append mode allows you to add new files to the ZIP file without deleting its current contents. After adding files to the ZIP file, thewith
statement goes out of context and closes the ZIP file.
TAR files are uncompressed file archives like ZIP. They can be compressed using gzip, bzip2, and lzma compression methods. TheTarFile
class allows reading and writing of TAR archives.
Do this to read from an archive:
importtarfilewithtarfile.open('example.tar','r')astar_file:print(tar_file.getnames())
tarfile
objects open like most file-like objects. They have anopen()
function that takes a mode that determines how the file is to be opened.
Use the'r'
,'w'
or'a'
modes to open an uncompressed TAR file for reading, writing, and appending, respectively. To open compressed TAR files, pass in a mode argument totarfile.open()
that is in the formfilemode[:compression]
. The table below lists the possible modes TAR files can be opened in:
Mode | Action |
---|---|
r | Opens archive for reading with transparent compression |
r:gz | Opens archive for reading with gzip compression |
r:bz2 | Opens archive for reading with bzip2 compression |
r:xz | Opens archive for reading with lzma compression |
w | Opens archive for uncompressed writing |
w:gz | Opens archive for gzip compressed writing |
w:xz | Opens archive for lzma compressed writing |
a | Opens archive for appending with no compression |
.open()
defaults to'r'
mode. To read an uncompressed TAR file and retrieve the names of the files in it, use.getnames()
:
>>>importtarfile>>>tar=tarfile.open('example.tar',mode='r')>>>tar.getnames()['CONTRIBUTING.rst', 'README.md', 'app.py']
This returns a list with the names of the archive contents.
Note: For the purposes of showing you how to use differenttarfile
object methods, the TAR file in the examples is opened and closed manually in an interactive REPL session.
Interacting with the TAR file this way allows you to see the output of running each command. Normally, you would want to use a context manager to open file-like objects.
The metadata of each entry in the archive can be accessed using special attributes:
>>>forentryintar.getmembers():...print(entry.name)...print(' Modified:',time.ctime(entry.mtime))...print(' Size :',entry.size,'bytes')...print()CONTRIBUTING.rst Modified: Sat Nov 1 09:09:51 2018 Size : 402 bytesREADME.md Modified: Sat Nov 3 07:29:40 2018 Size : 5426 bytesapp.py Modified: Sat Nov 3 07:29:13 2018 Size : 6218 bytes
In this example, you loop through the list of files returned by.getmembers()
and print out each file’s attributes. The objects returned by.getmembers()
have attributes that can be accessed programmatically such as the name, size, and last modified time of each of the files in the archive. After reading or writing to the archive, it must be closed to free up system resources.
In this section, you’ll learn how to extract files from TAR archives using the following methods:
.extract()
.extractfile()
.extractall()
To extract a single file from a TAR archive, useextract()
, passing in the filename:
>>>tar.extract('README.md')>>>os.listdir('.')['README.md', 'example.tar']
TheREADME.md
file is extracted from the archive to the file system. Callingos.listdir()
confirms thatREADME.md
file was successfully extracted into the current directory. To unpack or extract everything from the archive, use.extractall()
:
>>>tar.extractall(path="extracted/")
.extractall()
has an optionalpath
argument to specify where extracted files should go. Here, the archive is unpacked into theextracted
directory. The following commands show that the archive was successfully extracted:
$lsexample.tar extracted README.md$tree.├── example.tar├── extracted| ├── app.py| ├── CONTRIBUTING.rst| └── README.md└── README.md1 directory, 5 files$lsextracted/app.py CONTRIBUTING.rst README.md
To extract a file object for reading or writing, use.extractfile()
, which takes a filename orTarInfo
object to extract as an argument..extractfile()
returns a file-like object that can be read and used:
>>>f=tar.extractfile('app.py')>>>f.read()>>>tar.close()
Opened archives should always be closed after they have been read or written to. To close an archive, call.close()
on the archive file handle or use thewith
statement when creatingtarfile
objects to automatically close the archive when you’re done. This frees up system resources and writes any changes you made to the archive to the filesystem.
Here’s how you do it:
>>>importtarfile>>>file_list=['app.py','config.py','CONTRIBUTORS.md','tests.py']>>>withtarfile.open('packages.tar',mode='w')astar:...forfileinfile_list:...tar.add(file)>>># Read the contents of the newly created archive>>>withtarfile.open('package.tar',mode='r')ast:...formemberint.getmembers():...print(member.name)app.pyconfig.pyCONTRIBUTORS.mdtests.py
First, you make a list of files to be added to the archive so that you don’t have to add each file manually.
The next line uses thewith
context manager to open a new archive calledpackages.tar
in write mode. Opening an archive in write mode('w'
) enables you to write new files to the archive. Any existing files in the archive are deleted and a new archive is created.
After the archive is created and populated, thewith
context manager automatically closes it and saves it to the filesystem. The last three lines open the archive you just created and print out the names of the files contained in it.
To add new files to an existing archive, open the archive in append mode ('a'
):
>>>withtarfile.open('package.tar',mode='a')astar:...tar.add('foo.bar')>>>withtarfile.open('package.tar',mode='r')astar:...formemberintar.getmembers():...print(member.name)app.pyconfig.pyCONTRIBUTORS.mdtests.pyfoo.bar
Opening an archive in append mode allows you to add new files to it without deleting the ones already in it.
tarfile
can also read and write TAR archives compressed using gzip, bzip2, and lzma compression. To read or write to a compressed archive, usetarfile.open()
, passing in the appropriate mode for the compression type.
For example, to read or write data to a TAR archive compressed using gzip, use the'r:gz'
or'w:gz'
modes respectively:
>>>files=['app.py','config.py','tests.py']>>>withtarfile.open('packages.tar.gz',mode='w:gz')astar:...tar.add('app.py')...tar.add('config.py')...tar.add('tests.py')>>>withtarfile.open('packages.tar.gz',mode='r:gz')ast:...formemberint.getmembers():...print(member.name)app.pyconfig.pytests.py
The'w:gz'
mode opens the archive for gzip compressed writing and'r:gz'
opens the archive for gzip compressed reading.Opening compressed archives in append mode is not possible. To add files to a compressed archive, you have to create a new archive.
The Python Standard Library also supports creating TAR and ZIP archives using the high-level methods in theshutil
module. The archiving utilities inshutil
allow you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower leveltarfile
andzipfile
modules.
Working With Archives Usingshutil.make_archive()
shutil.make_archive()
takes at least two arguments: the name of the archive and an archive format.
By default, it compresses all the files in the current directory into the archive format specified in theformat
argument. You can pass in an optionalroot_dir
argument to compress files in a different directory..make_archive()
supports thezip
,tar
,bztar
, andgztar
archive formats.
This is how to create a TAR archive usingshutil
:
importshutil# shutil.make_archive(base_name, format, root_dir)shutil.make_archive('data/backup','tar','data/')
This copies everything indata/
and creates an archive calledbackup.tar
in the filesystem and returns its name. To extract the archive, call.unpack_archive()
:
shutil.unpack_archive('backup.tar','extract_dir/')
Calling.unpack_archive()
and passing in an archive name and destination directory extracts the contents ofbackup.tar
intoextract_dir/
. ZIP archives can be created and extracted in the same way.
Python supports reading data from multiple input streams or from a list of files through thefileinput
module. This module allows you to loop over the contents of one or more text files quickly and easily.Here’s the typical wayfileinput
is used:
importfileinputforlineinfileinput.input()process(line)
fileinput
gets its input fromcommand line arguments passed tosys.argv
by default.
Usingfileinput
to Loop Over Multiple Files
Let’s usefileinput
to build a crude version of the common UNIX utilitycat
. Thecat
utility reads files sequentially, writing them to standard output. When given more than one file in its command line arguments,cat
will concatenate the text files and display the result in the terminal:
# File: fileinput-example.pyimportfileinputimportsysfiles=fileinput.input()forlineinfiles:iffileinput.isfirstline():print(f'\n--- Reading{fileinput.filename()} ---')print(' -> '+line,end='')print()
Running this on two text files in my current directory produces the following output:
$python3fileinput-example.pybacon.txtcupcake.txt--- Reading bacon.txt --- -> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip, -> irure cillum drumstick elit. -> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta. -> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine. -> Tri-tip doner kevin cillum ham veniam cow hamburger. -> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in. -> Ball tip dolor do magna laboris nisi pancetta nostrud doner.--- Reading cupcake.txt --- -> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake. -> Topping muffin cotton candy. -> Gummies macaroon jujubes jelly beans marzipan.
fileinput
allows you to retrieve more information about each line such as whether or not it is the first line (.isfirstline()
), the line number (.lineno()
), and the filename (.filename()
). You can read more about ithere.
You now know how to use Python to perform the most common operations on files and groups of files. You’ve learned about the different built-in modules used to read, find, and manipulate them.
You’re now equipped to use Python to:
fileinput
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Practical Recipes for Working With Files in Python
🐍 Python Tricks 💌
Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.
MasterReal-World Python Skills With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
MasterReal-World Python Skills
With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
What Do You Think?
What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.
Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.
Keep Learning
Recommended Video Course:Practical Recipes for Working With Files in Python
Related Tutorials:
Already have an account?Sign-In
Almost there! Complete this form and click the button below to gain instant access:
5 Thoughts On Python Mastery