pytorch/pytorchPublic

NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

Multiprocessing Technical Notes

Jump to bottom

Soumith Chintala edited this pageSep 26, 2016 ·5 revisions

General Multiprocessing: Storage pointer swapping, Refcounting, Cleanup Daemon
Python Multiprocessing: Custom Picker and subclassing python.multiprocessing

General Multiprocessing: Storage pointer swapping, Refcounting, Cleanup Daemon

Multiprocessing involves sharing memory among processes.

On Unix systems, one can do this with

Sharing Storages among Process`A`,`B`

Creating Shared Storages

Usually this is the way one shares memory among processes.

Let's say processA wants to share memory[M, M+D], i.e. starting atM with sizeD.
A wants to shareM with processB.

ProcessA callsshm_open, which will create a shared memory object and map it to a file under the folder/dev/shm/. Let's say the file is/dev/shm/torch_shmfile_1 .shm_open will return a file descriptor to this file.
Then, processA truncates the shared memory object to the sizeD, usingftruncate
Then, processA callsmmap on this file descriptor, to get a pointer to this memory. Let's call this pointerSHM
Then, processA copies over memory[M, M+D] to[SHM, SHM+D].
Then, processA swaps all pointer references ofM to beSHM. There is no straight forward way to do this in general.

For (5), Torch has a Tensor / Storage abstraction. We can simply swap the data pointer inside the Storage and all Tensors referring to this Storage do not notice anything changing.

Now,A communicates the file path of the shared memory along with the size toB, i.e.("/dev/shm/torch_shmfile_1", D).

B will alsoshm_open the same file, callsmmap on the file descriptor and can map the same pointerSHM to a new Storage.

Now,A andB have succesfully shared memory. a storage inA points to the same memory as a storage inB.

Note: Shared Storages are not resizeable.

Freeing Shared Storages, need for reference counting

The next question then comes - how do we free this memory.

For this, one has to call the functionshm_unlink on the file descriptor ofSHM.Callingshm_unlink on the memory will remove the file/dev/shm/torch_shmfile_1 , but keeps existing file descriptors and the memory pointer alive.
Let's say that processA exits, and as part of exiting, the process deallocates all of it's Storage objects and callsshm_unlink onSHM. ProcessB still can access and useSHM, but the file that pointed toSHM, i.e./dev/shm/torch_shmfile_1 will be removed from the filesystem.

So, the problem in this scenario then becomes this:If processA shares a Tensor withB, then exits. Then processB cannot further share the Tensor to a processC.This is a problem for us.One can often think of a scenario whereA creates a process pool (of whichB is part of).B creates a Tensor, and shares it withA. ThenA terminates the process pool, and creates another pool, with which it tries to share the same Tensor, and this does not work asB has calledshm_unlink on the Tensor.

To resolve this, we now have to addreference counting to thisSHM object, and only callshm_unlink on this object once the references to this object reach 0.

So, what we hence do is that if you want to allocateSHM of sizeD, we will allocateSHM of sizeD + 4, and use the last 4 bytes as an integer for reference counting.To make sure that two processes changing this refcount integer at the same time does not have conflicts, we use hardware atomic instructions to increment or decrement this counter properly (they are available in all processors that we care about).

So far, we have covered:

Creating the shared memory
Swapping the memory pointers in Storage
Refcounting the shared memory so that we only call shm_unlink once

Handling cleanup when process dies abruptly with`SIGKILL`

There is another problem that we have not touched upon, which is:what happens to this shared memory if the process gets aSIGKILL.

A lot of users often call the commandkillall [processname] orkill -9 [processname]. It is the only command they know to kill a process, and they use it all the time :)
This sends aSIGKILL signal to the process.When a process gets aSIGKILL as opposed to aSIGINT, it is not given a chance to cleanup after itself.
This is a problem because, if we do not callshm_unlink on the shared memorySHM, it will remain occupied until you restart your computer of manually run the command:rm -f /dev/shm/torch_shmfile_1So, if the Tensor is of 8GB memory, then we essentially have leaked this 8GB of memory to never be used by any process again until the system restarts. This is horrible.

Hence, we have a new problem to solve, which is: how do we ensure that we cleanup safely, even if processesA,B,C are given aSIGKILL and die abruptly.

There are two solutions to this:

Remove the file/dev/shm/torch_shmfile_1 as soon as we create it usingunlink() (notshm_unlink), andsimply exchange file descriptors among processes. This way, if we haveSIGKILL, on the processes, there is nothing left to cleanup.
Launch a daemon process and unlink it from it's parent process. This daemon process stays alive even when it's parent process dies, and when parentA gets aSIGKILL, it detects thatA has died, and will cleanup afterwards. So, whenever we share a Tensor, we have to give the path of theSHM file to the daemon process (for example/dev/shm/torch_shmfile_1 ).

###Solution 1This is an elegant solution, but it suffers from the problem that, each shared Storage has to have one open file descriptor.

So, if we share 4000 Storages betweenA andB, then there are 4000 open file descriptors.This is usually not a problem in modern Operating Systems, but quite a few academic clusters limit the number of file descriptors per process, sometimes to as little as 1024.

So, keeping in mind users and support, we decided that this will not be a fully practical solution.

###Solution 2This solution also seems to be robust to processesA,B,C getting aSIGKILL.
Since the daemon removedA as it's parent process, it does not die immediately whenA dies.
As soon as the daemon process is launched, it opens a socket connection with processA.
When the socket connection dies, the daemon knows thatA has died abruptly (possibly by aSIGKILL) and cleans up afterwards.

After considering both the solutions thoroughly, we have implemented both Solution 1 and Solution 2 to solve thisSIGKILL / cleanup problem, and they can be switched at runtime. By default, you are set to Solution 1

I would love to contribute to PyTorch!

Movatterモバイル変換

Multiprocessing Technical Notes

General Multiprocessing: Storage pointer swapping, Refcounting, Cleanup Daemon

Sharing Storages among ProcessA,B

Creating Shared Storages

Freeing Shared Storages, need for reference counting

Handling cleanup when process dies abruptly withSIGKILL

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sharing Storages among Process`A`,`B`

Handling cleanup when process dies abruptly with`SIGKILL`